policy-server-asg-syncer logs on the Cloud Controller VM will show problems when running "cfnetworking.policy-server-asg-syncer"
{"timestamp":"2025-03-04T14:50:32.655211480Z","level":"info","source":"cfnetworking.policy-server-asg-syncer","message":"cfnetworking.policy-server-asg-syncer.get-security-groups","data":{}}{"timestamp":"2025-03-04T14:50:32.655775477Z","level":"info","source":"cfnetworking.policy-server-asg-syncer","message":"cfnetworking.policy-server-asg-syncer.get-security-groups-attempt-pagination","data":{}}
{"timestamp":"2025-03-04T14:50:38.494958418Z","level":"info","source":"cfnetworking.policy-server-asg-syncer","message":"cfnetworking.policy-server-asg-syncer.get-security-groups-attempt-pagination","data":{}}
{"timestamp":"2025-03-04T14:50:48.560604242Z","level":"info","source":"cfnetworking.policy-server-asg-syncer","message":"cfnetworking.policy-server-asg-syncer.get-security-groups-attempt-pagination","data":{}}{"timestamp":"2025-03-04T14:50:57.663535808Z","level":"error","source":"cfnetworking.policy-server-asg-syncer","message":"cfnetworking.policy-server-asg-syncer.asg-sync-cycle","data":{"error":"Ran out of retry attempts. Last error was: last_update time has changed\n"}}
The Mysql Slow Query log may show this query takes a very long time and have a lot of records
# Schema: networkpolicyserver Last_errno: 0 Killed: 0
# Query_time: 15.176863 Lock_time: 0.000001 Rows_sent: 2758 Rows_examined: 21522 Rows_affected: 0 Bytes_sent: 11907267
# Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0
# InnoDB_trx_id: 0
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0
# InnoDB_IO_r_ops: 0 InnoDB_IO_r_bytes: 0 InnoDB_IO_r_wait: 0.000000
# InnoDB_rec_lock_wait: 0.000000 InnoDB_queue_wait: 0.000000
# InnoDB_pages_distinct: 7915
SET timestamp=1741100486;
SELECT
id,
guid,
name,
rules,
staging_default,
running_default,
staging_spaces,
running_spaces
FROM security_groups WHERE (staging_default=true
OR running_default=true
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(staging_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))
OR json_contains(running_spaces, json_quote('ASG_GUID_#############')))
ORDER BY id;
Given the large amount of ASG rules on the platform along with a high rate of change to those rules caused by operations such as cf push, restage, etc.. The sync process has a hard time keeping up with the load. Some of the scalability issues identified are as followed
Update: this issue is still unresolved. Ongoing work continues to optimize the syncing processes between CAPI and Policy Server, as well as between Policy Server and VXLAN Policy Agent.
Fixes for the three issues as well as some related minor ones were originally included in TAS versions 4.0.35, 6.0.15, 10.0.5, however due to issues discovered with migrating existing high-scale databases, some of these fixes have been rolled back in versions 4.0.36, 6.0.16, and 10.0.6: