Apps may have missing Dynamic ASG configs during runtime and cf push when there are tens of thousands of ASG rules
search cancel

Apps may have missing Dynamic ASG configs during runtime and cf push when there are tens of thousands of ASG rules

book

Article ID: 395631

calendar_today

Updated On:

Products

VMware Tanzu Application Platform

Issue/Introduction

 policy-server-asg-syncer logs on the Cloud Controller VM will show problems when running "cfnetworking.policy-server-asg-syncer"

{"timestamp":"2025-03-04T14:50:32.655211480Z","level":"info","source":"cfnetworking.policy-server-asg-syncer","message":"cfnetworking.policy-server-asg-syncer.get-security-groups","data":{}}{"timestamp":"2025-03-04T14:50:32.655775477Z","level":"info","source":"cfnetworking.policy-server-asg-syncer","message":"cfnetworking.policy-server-asg-syncer.get-security-groups-attempt-pagination","data":{}}
{"timestamp":"2025-03-04T14:50:38.494958418Z","level":"info","source":"cfnetworking.policy-server-asg-syncer","message":"cfnetworking.policy-server-asg-syncer.get-security-groups-attempt-pagination","data":{}}
{"timestamp":"2025-03-04T14:50:48.560604242Z","level":"info","source":"cfnetworking.policy-server-asg-syncer","message":"cfnetworking.policy-server-asg-syncer.get-security-groups-attempt-pagination","data":{}}{"timestamp":"2025-03-04T14:50:57.663535808Z","level":"error","source":"cfnetworking.policy-server-asg-syncer","message":"cfnetworking.policy-server-asg-syncer.asg-sync-cycle","data":{"error":"Ran out of retry attempts. Last error was: last_update time has changed\n"}} 

The Mysql Slow Query log may show this query takes a very long time and have a lot of records

# Schema: networkpolicyserver  Last_errno: 0  Killed: 0
# Query_time: 15.176863  Lock_time: 0.000001  Rows_sent: 2758  Rows_examined: 21522  Rows_affected: 0  Bytes_sent: 11907267
# Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: 0
# Full_scan: Yes  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
# Filesort: No  Filesort_on_disk: No  Merge_passes: 0
#   InnoDB_IO_r_ops: 0  InnoDB_IO_r_bytes: 0  InnoDB_IO_r_wait: 0.000000
#   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000
#   InnoDB_pages_distinct: 7915
SET timestamp=1741100486;
SELECT
                        id,
                        guid,
                        name,
                        rules,
                        staging_default,
                        running_default,
                        staging_spaces,
                        running_spaces
                FROM security_groups WHERE (staging_default=true 
                OR running_default=true 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(staging_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############')) 
                OR json_contains(running_spaces, json_quote('ASG_GUID_#############'))) 
                ORDER BY id;

 

Cause

Given the large amount of ASG rules on the platform along with a high rate of change to those rules caused by operations such as cf push, restage, etc.. The sync process has a hard time keeping up with the load.  Some of the scalability issues identified are as followed

  • Cloud Controller was storing imprecise timestamps in the database which could lead to missed rule updates
  • Policy Server and VXLAN Policy Server Would always sync data even when it did not need to.  This leads to a higher database load and maybe even show a visible increase to MySQL CPU usage. 
  • The Policy Server indexes are inadequate to keep up with a large amount of ASG rules.

Resolution

Update: this issue is still unresolved. Ongoing work continues to optimize the syncing processes between CAPI and Policy Server, as well as between Policy Server and  VXLAN Policy Agent. 

Fixes for the three issues as well as some related minor ones were originally included in TAS versions 4.0.35, 6.0.15, 10.0.5, however due to issues discovered with migrating existing high-scale databases, some of these fixes have been rolled back in versions 4.0.36, 6.0.16, and 10.0.6:

  • Fixed - CAPI release 1.206.0 "Ensure asg-latest-update is renewed with microsecond precision even in mysql"
  • Fixed - Silk Release 3.69.0 includes several fixes for this problem
  • Rolled back - The Policy Server indexes have been improved in the above mentioned silk release for MySQL 8 only.  Internal Tanzu platform MySQL will see benefits, but if you have an external MySQL running 5.7 version then you may still observe some performance issues with regards to indexes. 

Additional Information