Cloud Controller "policy-server-asg-syncer" job in crash loop
search cancel

Cloud Controller "policy-server-asg-syncer" job in crash loop

book

Article ID: 396765

calendar_today

Updated On:

Products

VMware Tanzu Application Service

Issue/Introduction

TAS Cloud Controller job "policy-server-asg-syncer" appears to be in a crash loop on all 6 Cloud Controller instances. 

This error appears in "/var/vcap/sys/log/policy-server-asg-syncer/policy-server-asg-syncer.stdout.log":

{"timestamp":"2025-05-06T15:26:55.807028818Z","level":"error","source":"cfnetworking.policy-server-asg-syncer","message":"cfnetworking.policy-server-asg-syncer.asg-sync-cycle","data":{"error":"saving security group ########-####-####-####-############ (example_security_group): Error 1713 (HY000): Undo log record is too big."}}
...
{"timestamp":"2025-05-06T15:26:55.811640274Z","level":"error","source":"cfnetworking.policy-server-asg-syncer","message":"cfnetworking.policy-server-asg-syncer.exited-with-failure","data":{"error":"Exit trace for group:\nasg-syncer exited with error: saving security group ########-####-####-####-############ (example_security_group): Error 1713 (HY000): Undo log record is too big.\nasg-lock exited with nil\n"}}

Environment

Affected TAS versions:

  • 4.0.35
  • 6.0.15
  • 10.0.5

Only when using dynamic ASGs and a mysql DB.

Cause

The affected TAS versions include CF Networking Release 3.69.0. This includes new migrations for the policy server database that include functional indexes, which are supposed to make dynamic ASGs more performant. However, these indexes can be very large and they can impact the size of mysql undo logs, as evidenced in the errors in the policy-server-asg-syncer.stdout.log.

Related incidents have occurred on these versions where the database migrations failed

Resolution

A permanent fix is in CF Networking Release 3.70.0. For deployments that are using MYSQL DBs with dynamic ASGS enabled, we suggest skipping all affected TAS versions and waiting for the next release which will include the fixed CF Networking Release. The permanent fix takes into account that some customers will have followed the mitigations steps and manually updated their databases.

A mitigation which has proven effective is to remove two functional indexes. 

  1. Connect to the network_policies DB.

    a. To start, get the TAS deployment name, then SSH into any of the mysql VMs:

    bosh deployments --column=name
    [...]
    bosh -d cf-... ssh mysql/0

    b. Connect to the database:
    sudo mysql --defaults-file=/var/vcap/jobs/pxc-mysql/config/mylogin.cnf

    c. Use the network_policies database:
    use network_policies;

     

  2. Remove the staging_spaces_idx on the security_groups table from migration 82.
    DROP INDEX staging_spaces_idx ON security_groups;
  3. Remove the running_spaces_idx on the security_groups table from migration 83.
    DROP INDEX running_spaces_idx ON security_groups;