Cross subnet traffic is dropped after edge bridge fails back after exiting maintenance mode
search cancel

Cross subnet traffic is dropped after edge bridge fails back after exiting maintenance mode

book

Article ID: 375758

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Overlay VMs need to reach external VLAN gateway and VLAN backed VMs via edge bridging for cross-subnet (different subnets) communication.
  • When active bridging enters NSX Maintenance Mode, traffic is redirected to the new active bridge high availability peer without problem. 
  • When the edge exits NSX Maintenance mode and becomes active, traffic loss is observed. 
  • There is a large number of VLAN MAC addresses (for example, over 500).
  • Traffic loss from seconds to minutes after the active bridge fail-back.

Environment

VMware NSX

Cause

The root cause is due to the large amount of VLAN mac addresses that need to be synced from active edge to standby edge. The mac-sync full-sync message processing logic hits the limit of the edge software learning queue (queue size 512 per edge) and results in some mac addresses not being synced.


Resolution

This issue is resolved in VMware NSX 4.2.1, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

 

Workaround:

1. If the total VLAN workload exceeds 500 VLAN MAC addresses, use multiple edge bridge clusters to carry these workloads.
2. After the edge exits NSX maintenance mode, issue manual mac-sync re-sync command to make sure the mac-sync table is synced between the edge bridge HA pair.

edge-appctl -t /var/run/vmware/edge/dpd.ctl  mac-sync/request-sync <bridge port uuid>
edge-appctl -t /var/run/vmware/edge/dpd.ctl  mac-sync/show-table <bridge port uuid> | json_pp