Network connectivity loss for VMs migrated to ESXi hosts connected to a specific Cisco FEX Switch
search cancel

Network connectivity loss for VMs migrated to ESXi hosts connected to a specific Cisco FEX Switch

book

Article ID: 439780

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

When virtual machines (VMs) are migrated via vMotion to a specific subset of ESXi hosts, they experience a complete loss of network connectivity. This issue occurs despite the VM's port group having Notify Switches set to Yes.

Symptoms:

  • RARP packets are observed leaving the ESXi host's physical uplink (vmnic) during migration.

  • No response to RARP is received from the physical gateway.

  • The physical switch fails to update its MAC address table for the migrated VM.

  • Outage is isolated to hosts connected to specific Cisco FEX (Fabric Extender) hardware

    • RARP was sent after adding/removing but physical switch is not sending packets to ESXi.

      To capture both in/out packets at new uplink:

      pktcap-uw --uplink vmnicX --dir 2 -o /vmfs/volumes/<VOLUME>/vmnicX.pcapng --ng

      NOTE:
      Replace vmnicX and <VOLUME>.
      Depending on workloads, uplink may be handling huge traffic. Specify count with -c option to limit a number of packets captured.

      Run tshark command to show RARP and ICMP packets.
      In this example, RARP is sent from ESXi to physical switch multiple times to notify uplink change on adding/removing uplink to/from vSwitch.
      However, ESXi is not receiving ICMP request from Cisco FEX switch.

      $ tshark -r vmnicX.pcapng 'eth.addr==<MAC Address> && (icmp || eth.type == 0x8035)'
       35.705439 <MAC_ADDRESS> → Broadcast    RARP 60 Who is <MAC_ADDRESS>? Tell <MAC_ADDRESS>
       40.705274 <MAC_ADDRESS> → Broadcast    RARP 60 Who is <MAC_ADDRESS>? Tell <MAC_ADDRESS>
       49.705776 <MAC_ADDRESS> → Broadcast    RARP 60 Who is <MAC_ADDRESS>? Tell <MAC_ADDRESS>
       58.706403 <MAC_ADDRESS> → Broadcast    RARP 60 Who is <MAC_ADDRESS>? Tell <MAC_ADDRESS>
       76.707094 <MAC_ADDRESS> → Broadcast    RARP 60 Who is <MAC_ADDRESS>? Tell <MAC_ADDRESS>
      103.708129 <MAC_ADDRESS> → Broadcast    RARP 60 Who is <MAC_ADDRESS>? Tell <MAC_ADDRESS>
      148.709893 <MAC_ADDRESS> → Broadcast    RARP 60 Who is <MAC_ADDRESS>? Tell <MAC_ADDRESS>
      158.492451  <SENDER_IP_ADDRESS> → <RECEIVER_IP_ADDRESS>  ICMP 98 Echo (ping) request  id=0x0002, seq=3254/46604, ttl=64
      158.492529  <RECEIVER_IP_ADDRESS> → <SENDER_IP_ADDRESS>  ICMP 98 Echo (ping) reply    id=0x0002, seq=3254/46604, ttl=64

      NOTE:
      <MAC Address>: Guest OS MAC address.
      EtherType 0x8035 is Reverse Address Resolution Protocol (RARP)

Environment

VMware vSphere ESXi

Cause

The physical Cisco FEX switch has reached or exceeded its configured MAC learning limit. When the ESXi host sends a Reverse ARP (RARP) to announce the VM's new location, the FEX discards the update because the MAC table threshold is restricted, preventing the physical network from routing traffic to the new port.

Resolution

Contact Cisco vendor to investigate why Cisco FEX switch is not sending RARP reply packets to ESXi after receiving RARP request.

Additional Information

IP to MAC mapping, GARP, RARP and Notify Switch setting for Virtual Machine Connectivty