Some of the NSGroups dynamic membership stops being updated at random intervals.
book
Article ID: 345842
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
To recognize this symptom and provide a workaround to address it.
Symptoms:
Newly deployed VMs do not get added to correct NSgroup based on dynamic criteria.
Multiple NSgroups configured with same dynamic criteria. If there are multiple criteria defined in NSG, one of the criteria matches with another NSG.
In /var/log/proton/nsxapi.log, similar log message appears in one of the managers which owns the NSG. These messages show the NSG was loaded in the in-memory cache of the nsx manager. If the NSG in question does NOT show this message in any of the managers, its possible that NSG has already hit the problem.
2021-03-14T21:28:56.084Z INFO pool-310-thread-2 NSGroupMembershipUpdateTask - GROUPING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] simpleExpressionNode [VirtualMachineContainer.name CONTAINS eng-paas-t-eusw1a-1-] mapped to NSGroups [NSGroup/9xxxxxx6-0000-0000-0000-exxxxxxxxxxe]
Even though NSG hit the problem, the "NSGroupMembershipRefreshTask" that runs at 2AM every day makes sure the dynamic criteria is re-evaluated and objects get added to NSG correctly based on the expression.
2021-03-14T21:27:11.418Z INFO NSGroupMembershipRefreshTask NSGroupMembersRefreshTask - GROUPING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] current member count in NSGroup 9xxxxxx6-0000-0000-0000-exxxxxxxxxxe = 18 for node [VirtualMachineContainer.name CONTAINS eng-paas-t-eusw1a-1-], resultsId NSGroupExpEvaluationResults/1xxxxxxa-0000-0000-0000-3xxxxxxxxxx6
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
Environment
VMware NSX-T Data Center
Cause
2x NSGs with same criteria are managed by same NSX Management node. After some time, resharding caused by proton restarts, one of the NSGs can get moved to a different NSX Management node, which will lead to this symptom.
Resolution
This issue is resolved in NSX-T 3.0.3 and 3.1.0.
Workaround: Restart proton on the NSX-T Manager node that owns the NSGroup, if its not clear which node own the NSG, restart proton on all 3 managers, one after the other. From the manager's root shell: /etc/init.d/proton restart
Additional Information
Impact/Risks: Virtual machine network traffic hits the wrong distributed firewall rule and traffic may get dropped (or incorrectly allowed).