Some of the NSGroups dynamic membership stops being updated at random intervals.
search cancel

Some of the NSGroups dynamic membership stops being updated at random intervals.

book

Article ID: 345842

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

To recognize this symptom and provide a workaround to address it.

Symptoms:
  • Newly deployed VMs do not get added to correct NSgroup based on dynamic criteria.
  • Multiple NSgroups configured with same dynamic criteria. If there are multiple criteria defined in NSG, one of the criteria matches with another NSG.
  • In /var/log/proton/nsxapi.log, similar log message appears in one of the managers which owns the NSG. These messages show the NSG was loaded in the in-memory cache of the nsx manager. If the NSG in question does NOT show this message in any of the managers, its possible that NSG has already hit the problem.
2021-03-14T21:28:56.084Z INFO pool-310-thread-2 NSGroupMembershipUpdateTask - GROUPING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] simpleExpressionNode [VirtualMachineContainer.name CONTAINS eng-paas-t-eusw1a-1-] mapped to NSGroups [NSGroup/9xxxxxx6-0000-0000-0000-exxxxxxxxxxe]
  • Even though NSG hit the problem, the "NSGroupMembershipRefreshTask" that runs at 2AM every day makes sure the dynamic criteria is re-evaluated and objects get added to NSG correctly based on the expression.
2021-03-14T21:27:11.418Z INFO NSGroupMembershipRefreshTask NSGroupMembersRefreshTask - GROUPING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] current member count in NSGroup 9xxxxxx6-0000-0000-0000-exxxxxxxxxxe = 18 for node [VirtualMachineContainer.name CONTAINS eng-paas-t-eusw1a-1-], resultsId NSGroupExpEvaluationResults/1xxxxxxa-0000-0000-0000-3xxxxxxxxxx6

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX-T Data Center

Cause

2x NSGs with same criteria are managed by same NSX Management node. After some time, resharding caused by proton restarts, one of the NSGs can get moved to a different NSX Management node, which will lead to this symptom.

Resolution

This issue is resolved in NSX-T 3.0.3 and 3.1.0.

Workaround:
Restart proton on the NSX-T Manager node that owns the NSGroup, if its not clear which node own the NSG, restart proton on all 3 managers, one after the other.
From the manager's root shell: /etc/init.d/proton restart

Additional Information

Impact/Risks:
Virtual machine network traffic hits the wrong distributed firewall rule and traffic may get dropped (or incorrectly allowed).