When NSX is configured with large scale configuration and we have large number of groups when large number of ipsets to it , during nsx-config full sync the nsx-config-0-0 pod can go out-of-memory.
SSP 5.0
The out-of-memory for nsx-config-0-0 pod can happen due to in-memory relationship building of ipAddress to groups. Our acceptable scale for IP based groups is 10,000 and if there are more number of IP based groups or many groups with ipsets ranging from 0-5000 or more IP Addresses, this issue can be observed.
This issue can be confirmed based on following observations :
1. Run the following command to check the status of the pod
In the output from above command, search for nsx-config containers and check the Last State/Reason, to confirm if it is the restart was due to OOM issue. The last state should be Terminated with reason should be OOMKilled.
2. Another observation that could be done is to check the current memory consumption of nsx-config-0-0 pod for 10minutes and reaching above 45000Mi.
Else run the following script to record the output to the file and terminate the script after 10 mins.
3. Observe nsx-config-0-0 logs using following command containing large number of ipsets for multiple groups
...Sending to druid ManagerRealizationConfig { "revision" : 0, "tags" : [ ], "nsx_agent_seen_time" : 1741214631148, "site_id" : "3c582b44-9a82-4363-9e1a-92ce6b8f622a", "config_type" : "NS_GROUP", "timestamp" : "2025-03-05T22:45:08.923492328Z", "epoch" : 7, "mp_uuid" : "084838e8-06d0-4a4a-b9e9-1f404cab9e64", "policy_path_from_tag" : "/infra/domains/default/groups/ukgrp3_0", "display_name" : "ukgrp3_0", "create_user" : "admin", "create_time" : 1741180839520, "last_modified_user" : "admin", "last_modified_time" : 1741180839520, "deleted" : false, "deletion_time" : 0, "scope" : "LOCAL", "scopeTagPair" : [ ], "effective_and_related_compute_members" : [ ], "effective_segments" : [ ], "effective_segment_ports" : [ ], "membership_types" : [ "IPAddress" ], "ip_set_contents" : [ "X.X.X.11, ... ,"X.X.X.99" ], <--- Large number of ipsets "system_owned" : false}
2. Check current memory allocation for nsx-config container in nsx-config-0-0 pod, which should be 5000Mi both requested and limit. If already 7000Mi that would infer that the remediation was already applied and further steps may not be helpful. in such scenario contact support.
3. Increase the full sync timeout , run the following command to update the nsx-config config-map
the update fullSyncTimeoutMills value to 3600000 as mentioned below
4.Increase the pod memory to 7000Mi using following command
5. Check current memory allocation for nsx-config container in nsx-config-0-0 pod now using the command in Step 2, it should be 7000Mi both requested and limit.