Datapath disruption for NSX Bridge traffic on vmnic failback
search cancel

Datapath disruption for NSX Bridge traffic on vmnic failback

book

Article ID: 367502

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • All NSX versions.
  • Edge VM configured for bridging connected to either vDS portgroup or segment with MAC learning enabled.
  • When an ESX vmnic used by the Edge VM goes down, no disruption is observed on failover.
  • When the ESX vmnic comes back up, there is dataplane disruption for Bridge traffic.

Cause

  • This behaviour is observed when traffic is initiated from VLAN to overlay e.g. a ping from VM-A on VLAN to VM-B on overlay segment.
  • When the ESX vmnic goes down, the physical switch will clear its MAC table and flood the ping request.
  • On receiving the ping reply, the physical switch will update its MAC table.
  • And so for vmnic failover there is no disruption.
  • When the vmnic comes back up, the Edge VM interface may move back to this vmnic interface.
  • The ESX sends a RARP for learned MAC addresses but due to a software issue does not tag the packet with the VLAN ID.
  • The physical switch does not know the MAC address has moved and will continue to send traffic to the old vmnic.
  • Datapath connectivity will only be restored when traffic is received from VM-B and the switch updates its MAC table.

    Note:
  • This issue is specific to a vDS portgroup or segment configured for MAC learning.
  • If promiscuous mode is configured on the portgroup then disruption is expected as no RARP is sent during failback.

Resolution

This issue is resolved in NSX 4.1.1 .

 

Workarounds:

  • Have a keepalive always sending traffic from overlay to VLAN e.g. continuous ping.
    or
  • vMotion an Edge to another ESX host before bringing up the vmnic.
    or
  • Failover the Bridge to the standby Edge before bringing up the vmnic.
    or
  • On the vSphere Client, increase the timeout period before a vmnic is considered ready for use Configuration > Advanced Settings > Net > Net.teampolicyupdelay. This allows more time to consider workaround options above.