Traffic disruption observed after a Bridge Edge failover
search cancel

Traffic disruption observed after a Bridge Edge failover

book

Article ID: 378741

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX-T Edge is used to bridge traffic from VLAN or VXLAN to Geneve
  • VMs have been migrated from VLAN or NSX-v to NSX-T segments
  • Edge Bridge failover has occurred triggering the Standby node to become  Active
  • VMs that were migrated to NSX-T lose network connectivity after the Bridge failover

Environment

VMware NSX

Cause

  • When VMs are migrated from VLAN or NSX-v to NSX-T, the Edge does not correctly clear MAC entries it learnt on the VLAN/NSX-v side of the Bridge.
  • These remain as stale entries but otherwise are non-impacting.
  • When an Edge Bridge failover happens, the new Active Edge should send GARPs for NSX-T learnt MAC addresses to update the TOR switches.
  • Due to the stale entries learnt first on NSX-v/VLAN side, no GARP is sent for these migrated VMs.
  • As a result the TOR black holes traffic for these VMs to the Edge node which is no longer active and traffic is dropped.
  • Furthermore, the Edge instead sends a GARP for these MACs on the NSX-T overlay which poisons the TEP table of the ESXi hosts with information that these VM MACs should be reached through the Bridge.
  • The outage will end when poisoned TEP table entries age out and the VMs originate traffic through the Bridge.
  • This scenario is most likely observed if a failover occurs during a NSX-v to NSX-T migration.

Resolution

This issue is resolved in VMware NSX 4.1.1, available at Broadcom downloads.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.


If running a lower NSX version, try to avoid any activity that may trigger a Bridge failover during a NSX-v to NSX-T migration,.
Ensure DRS is disabled for the Edge Bridge.