MAC flapping issue happening at switch level connected to ESXi host.
search cancel

MAC flapping issue happening at switch level connected to ESXi host.

book

Article ID: 402113

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • Random MAC address learned at switch interface level connecting to a specific ESXi host in cluster. This is causing MAC flood.
  • At the time of the issue. Switch side MAC flapping observed as highlighted in yellow:

  • Switch vendor Identified the cause of the issue as a ESXi server, after disconnecting the server from the network level by performing switch port shut the issue is getting resolved. MAC flood stops.

Environment

  • The environment is an EVPN-VXLAN fabric where leaf switches learn the MAC addresses of virtual machines (VMs) from the connected ESXi hosts. EVPN is used as the control plane to advertise this MAC/ARP reachability information across the VXLAN overlay to all other leaf switches, ensuring efficient Layer 2 connectivity.
  • The switch vendor suspects that a specific ESXi host is causing network instability. The hypothesized behavior is that after learning of an external MAC address (i.e., the MAC of a VM on a different host), the problematic ESXi host incorrectly begins to transmit packets sourcing from that external MAC address. This action causes the upstream leaf switch to see the same MAC address originating from two different physical interfaces, triggering a persistent MAC address flapping condition.
  • In the above screenshot,  MAC address belonging to a VM on Host 1 is correctly learned and distributed throughout the cluster via EVPN. However, Host 6 appears to be re-broadcasting traffic using the MAC address of the Host 1 VM as its own source.

Cause

  • The leaf switch learns the legitimate MAC from Host 1's uplink.
  • Host 6 improperly sources traffic with the same MAC, causing the switch to re-learn the MAC on Host 6's uplink.
  • This continuous re-learning between the two ports creates a MAC flap.
  • The instability forces the switch to treat frames for this MAC as unknown unicast, leading to excessive traffic flooding across all ports within the segment. This "MAC flood" overwhelms the connected hosts and degrades network performance.
  • Troubleshooting was performed to isolate a MAC flapping issue from the VMware infrastructure.

    • Both the VMNICX & VMNICY from ESXi level were flipped down using (esxcli network nic down -n vmnicN).
    • The switch interface connecting to VMNICX & VMNICY were brought up and after sometime, MAC flap issue started at switch level. 

    • Packet capture were performed at ESXi layer at this time and could not observe a single packet from both the VMNICs (X & Y) in egress and ingress direction. This would imply ESXi is not participating/ sending-receiving any packets while switch is already having the MAC flap issue.

    •  Another packet captures were performed at the time of the issue, while keeping the VMNICs up and all the VMs down. There were only a few packets at VMNIC level in egress direction and those were only RARP. However, the VMNICs received over 75000 packets in ingress direction and 99% of them were ARP. 
    • The above test should imply the possible root cause of the issue outside VMware layer. 

Resolution

  • The above tests in this scenario confirmed the issue external to VMware layer. As the issue is specific to a single host, server OEM vendor should investigate this further possibly pointing to network adapter level issue. 
  • Physical network adapters replacement on the servers fixed the MAC flapping issue in the network. 

Additional Information

Packet capture on ESXi using the pktcap-uw tool: https://knowledge.broadcom.com/external/article/341568/using-the-pktcapuw-tool-in-esxi-55-and-l.html