Datapath disruption for NSX Bridge traffic on vmnic failback
book
Article ID: 367502
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
All NSX versions.
Edge VM configured for bridging connected to either vDS portgroup or segment with MAC learning enabled.
When an ESX vmnic used by the Edge VM goes down, no disruption is observed on failover.
When the ESX vmnic comes back up, there is dataplane disruption for Bridge traffic.
Cause
This behaviour is observed when traffic is initiated from VLAN to overlay e.g. a ping from VM-A on VLAN to VM-B on overlay segment.
When the ESX vmnic goes down, the physical switch will clear its MAC table and flood the ping request.
On receiving the ping reply, the physical switch will update its MAC table.
And so for vmnic failover there is no disruption.
When the vmnic comes back up, the Edge VM interface may move back to this vmnic interface.
The ESX sends a RARP for learned MAC addresses but due to a software issue does not tag the packet with the VLAN ID.
The physical switch does not know the MAC address has moved and will continue to send traffic to the old vmnic.
Datapath connectivity will only be restored when traffic is received from VM-B and the switch updates its MAC table.
Note:
This issue is specific to a vDS portgroup or segment configured for MAC learning.
If promiscuous mode is configured on the portgroup then disruption is expected as no RARP is sent during failback.
Resolution
This issue is resolved in NSX 4.1.1 .
Workarounds:
Have a keepalive always sending traffic from overlay to VLAN e.g. continuous ping. or
vMotion an Edge to another ESX host before bringing up the vmnic. or
Failover the Bridge to the standby Edge before bringing up the vmnic. or
On the vSphere Client, increase the timeout period before a vmnic is considered ready for use Configuration > Advanced Settings > Net > Net.teampolicyupdelay. This allows more time to consider workaround options above.