Datapath disruption for NSX Bridge traffic on vmnic failback
search cancel

Datapath disruption for NSX Bridge traffic on vmnic failback

book

Article ID: 367502

calendar_today

Updated On:

Products

VMware NSX VMware NSX-T Data Center

Issue/Introduction

  • Observed on all NSX versions prior to NSX version 4.1.1.
  • Edge VM configured for bridging connected to either vDS portgroup or segment with MAC learning enabled.
  • When an ESX vmnic used by the Edge VM goes down, no disruption is observed on failover.
  • When the ESX vmnic comes back up, there is dataplane disruption for Bridge traffic.

Environment

VMware NSX- T Data Center

VMware NSX

Cause

  • This behaviour is observed when traffic is initiated from VLAN to overlay e.g. a ping from VM-A on VLAN to VM-B on overlay segment.
  • When the ESX vmnic goes down, the physical switch will clear its MAC table and flood the ping request.
  • On receiving the ping reply, the physical switch will update its MAC table.
  • And so for vmnic failover there is no disruption.
  • When the vmnic comes back up, the Edge VM interface may move back to this vmnic interface.
  • The ESX sends a RARP for learned MAC addresses but due to a software issue does not tag the packet with the VLAN ID.
  • The physical switch does not know the MAC address has moved and will continue to send traffic to the old vmnic.
  • Datapath connectivity will only be restored when traffic is received from VM-B and the switch updates its MAC table.

Note:

  • This issue is specific to a vDS portgroup or segment configured for MAC learning.
  • If promiscuous mode is configured on the portgroup then disruption is expected as no RARP is sent during failback.

Resolution

This issue is resolved in VMware NSX 4.1.1, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

 

Workarounds:

  • Have a keepalive always sending traffic from overlay to VLAN e.g. continuous ping.
    or
  • vMotion an Edge to another ESX host before bringing up the vmnic.
    or
  • Failover the Bridge to the standby Edge before bringing up the vmnic.
    or
  • On the vSphere Client, increase the timeout period before a vmnic is considered ready for use Configuration > Advanced Settings > Net > Net.teampolicyupdelay. This allows more time to consider workaround options above.