When virtual machines (VMs) are migrated via vMotion to a specific subset of ESXi hosts, they experience a complete loss of network connectivity. This issue occurs despite the VM's port group having Notify Switches set to Yes.
Symptoms:
RARP packets are observed leaving the ESXi host's physical uplink (vmnic) during migration.
No response to RARP is received from the physical gateway.
The physical switch fails to update its MAC address table for the migrated VM.
Outage is isolated to hosts connected to specific Cisco FEX (Fabric Extender) hardware
pktcap-uw --uplink vmnicX --dir 2 -o /vmfs/volumes/<VOLUME>/vmnicX.pcapng --ng
NOTE:
Replace vmnicX and <VOLUME>.
Depending on workloads, uplink may be handling huge traffic. Specify count with -c option to limit a number of packets captured.
Run tshark command to show RARP and ICMP packets.
In this example, RARP is sent from ESXi to physical switch multiple times to notify uplink change on adding/removing uplink to/from vSwitch.
However, ESXi is not receiving ICMP request from Cisco FEX switch.$ tshark -r vmnicX.pcapng 'eth.addr==<MAC Address> && (icmp || eth.type == 0x8035)' 35.705439 <MAC_ADDRESS> → Broadcast RARP 60 Who is <MAC_ADDRESS>? Tell <MAC_ADDRESS> 40.705274 <MAC_ADDRESS> → Broadcast RARP 60 Who is <MAC_ADDRESS>? Tell <MAC_ADDRESS> 49.705776 <MAC_ADDRESS> → Broadcast RARP 60 Who is <MAC_ADDRESS>? Tell <MAC_ADDRESS> 58.706403 <MAC_ADDRESS> → Broadcast RARP 60 Who is <MAC_ADDRESS>? Tell <MAC_ADDRESS> 76.707094 <MAC_ADDRESS> → Broadcast RARP 60 Who is <MAC_ADDRESS>? Tell <MAC_ADDRESS>103.708129 <MAC_ADDRESS> → Broadcast RARP 60 Who is <MAC_ADDRESS>? Tell <MAC_ADDRESS>148.709893 <MAC_ADDRESS> → Broadcast RARP 60 Who is <MAC_ADDRESS>? Tell <MAC_ADDRESS>158.492451 <SENDER_IP_ADDRESS> → <RECEIVER_IP_ADDRESS> ICMP 98 Echo (ping) request id=0x0002, seq=3254/46604, ttl=64158.492529 <RECEIVER_IP_ADDRESS> → <SENDER_IP_ADDRESS> ICMP 98 Echo (ping) reply id=0x0002, seq=3254/46604, ttl=64
NOTE:
<MAC Address>: Guest OS MAC address.
EtherType 0x8035 is Reverse Address Resolution Protocol (RARP)
VMware vSphere ESXi
The physical Cisco FEX switch has reached or exceeded its configured MAC learning limit. When the ESXi host sends a Reverse ARP (RARP) to announce the VM's new location, the FEX discards the update because the MAC table threshold is restricted, preventing the physical network from routing traffic to the new port.
Contact Cisco vendor to investigate why Cisco FEX switch is not sending RARP reply packets to ESXi after receiving RARP request.