During an NSX-T maintenance-mode upgrade, there can be an intermediate period of time wherein the ESXI management interface receives traffic from VLANs that are different from the VLAN configured in its respective Portgroup. This issue can occur only during the upgrade process and is resolved when the upgrade finishes. The impact of this is a connectivity issue with respect to vlan tagging on the ESXI host management VMK for several seconds during the upgrade process.
VMware NSX
To reproduce this issue: Generate network traffic to an ESXI host management vmkernel IP address on it's respective VLAN as well as traffic from a different VLAN. Packets from both VLANs will be seen despite the fact that the portgroup is only tagged for management traffic. Transmitted traffic may be untagged as well. Prior to upgrade only expected traffic from the configured VLAN is sent and received.
During the maintenance mode upgrade, there is an intermediate/transitionary step where the switch implementation changes from vswitch to stub-vswitch to allow the old release modules to be uninstalled followed by new release modules getting installed. After the new modules are installed, the switch implementation changes from stub-vswitch back to new release's vswitch.
This stub-vswitch implementation is a very minimal forwarding implementation which has the following behavior:
1. Only vmknic and uplink traffic are handled. VM traffic is expected to be in blocked state.
2. This is a flooding switch. Meaning there is no MAC table based forwarding therefore traffic from a port is flooded to all other ports.
3. The enforced VLAN policy is not enforced in this state. This behavior has contributed to this issue resulting in unnecessary traffic seen on the ESXI management interface.
In this instance, the management vmknic was in stub-vswitch implementation between the following periods and the corresponding logging can be reviewed to identify this problem:
2024-11-17T05:49:04.662Z In(182) vmkernel: cpu58:23619227)NetHotswap: 516: DvsPortset-0: changing class from vswitch to stub-vswitch (moduleID 4294967295)
2024-11-17T05:50:19.903Z In(182) vmkernel: cpu70:23619227)NetHotswap: 516: DvsPortset-0: changing class from stub-vswitch to vswitch (moduleID 4294967295)
When traffic is transmitted the VLAN IO chains needed for switch tagging is not getting inserted properly into the stub-vswitch implementation. As a result, the traffic from vmk0 will not be tagged with VLAN when it leaves the ESX host. The following logs outline this behavior:
2024-11-17T05:49:04.663Z In(182) vmkernel: cpu58:23619227)NetIOChain: 163: Failed to insert IOChain to port 0x400000a for iocl is not ready for new lock model
2024-11-17T05:49:04.663Z In(182) vmkernel: cpu58:23619227)NetIOChain: 163: Failed to insert IOChain to port 0x400000a for iocl is not ready for new lock model
2024-11-17T05:49:04.663Z In(182) vmkernel: cpu58:23619227)NetIOChain: 163: Failed to insert IOChain to port 0x400000b for iocl is not ready for new lock model
The Workaround is to use the In-place upgrade procedure.
This is a code fix that will be implemented into NSX-T 4.2.2 and VCF 9.0.