Random network packet loss (ping drops) is seen for Virtual Machines (VMs) residing on a specific ESXi host configured with Active/Active uplinks.
search cancel

Random network packet loss (ping drops) is seen for Virtual Machines (VMs) residing on a specific ESXi host configured with Active/Active uplinks.

book

Article ID: 433226

calendar_today

Updated On:

Products

VMware NSX VMware vSphere ESXi

Issue/Introduction

  • Ping drops persist even when targeting the default gateway from the VM
  • Same VM's when migrated to different ESXI hosts in the same cluster dont have any ping drops
  • Disabling one uplink for example vmnic2 stabilizes the connection, indicating a path-specific failure
  • Using pktcap-uw on the suspect uplink (vmnic2), ARP requests are observed leaving the host but no ARP replies are received, leading to an incomplete adjacency

pktcap-uw --uplink vmnic2 --capture UplinkSndKernel,UplinkRcvKernel -o - | tcpdump-uw -enr -| grep 10.1.#.10
The name of the uplink is vmnic2.
The session capture point is UplinkSndKernel,UplinkRcvKernel.
pktcap: The output file is -.
pktcap: No server port specifed, select 58832 as the port.
pktcap: Local CID 2.
pktcap: Listen on port 58832.
pktcap: Main thread: 100774202240.
pktcap: Dump Thread: 100774737664.
pktcap: Recv Thread: 100775266048.
pktcap: Accept...
pktcap: Vsock connection from port 1038 cid 2.
reading from file -, link-type EN10MB (Ethernet)

04:02:10.946529 00:##:56:##:90:## > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 10.1.#.1 tell 10.1.#.10, length 46
04:02:11.949969 00:##:56:##:90:## > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 10.1.#.1 tell 10.1.#.10, length 46

  • Multiple "who-has" requests for the gateway IP are transmitted, but the capture shows no inbound response (UplinkRcvKernel) for these requests
  • The packet capture confirms that traffic is successfully egressing vmnic2 (UplinkSndKernel), but the return traffic (ARP Reply) is not reaching the ESXi kernel via that same interface. The fact that the issue disappears when the uplink is shut down proves that the physical network path associated with that specific vmnic is black-holing traffic

Environment

VMware vSphere ESXI

Cause

Mismatched or incorrect physical switch port configuration on the specific port connected to vmnic2

Resolution

Identify the physical switch and port ID connected to the vmnic and Align the physical switch port configuration with the VMware Virtual Switch requirements ensures that the physical fabric correctly learns MAC addresses and forwards return packets to the appropriate physical interface in an Active/Active pair

Additional Information

アクティブ/アクティブ構成のアップリンクを持つ ESXi ホスト上で動作する仮想マシンにおいてランダムなネットワークパケットロスが発生する