VMs go into a hung state while backups are run post which packets are dropped intermittently
search cancel

VMs go into a hung state while backups are run post which packets are dropped intermittently

book

Article ID: 432168

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Post upgrade of ESXi hosts, VMs experience the following network behaviors:

  • Pings to the Default Gateway or external hosts show intermittent ping loss.

  • VMs on the host and same VLAN communicate without issues.
  • VMs on different hosts within the same VLAN experience packet loss.
  • Packet captures on physical uplinks (vmnic) show duplicate ICMP replies or unexpected "Inbound" packets that should be "Outbound" (e.g., seeing a VM's own Echo Request returning to the host on the ingress path).g to the host on the ingress path).

 

Environment

VMware ESXi

Cause

The issue is rooted in the physical network layer,  specifically involving a misconfiguration of the Remote Switched Port Analyzer (RSPAN). This resulted in unintended packet flow and duplication across the physical uplinks.

Resolution

  • To confirm the network is looping traffic back to the host, run a packet capture following the below steps on the ESXi host focusing on the physical uplink used by the affected VM. 
    1. Find the vmnic for the VM using the command: netdbg vswitch instance list | grep -i <vm_name>
    2. Start a continuous ping from the VM to the VM default gateway.
    3. Perform a packet capture on ESXi host running this VM.

pktcap-uw --uplink <vmnicX> --capture UplinkRcvKernel --proto 0x01 --ip <SrcIP or DestIP> -o /vmfs/volumes/<datastore_name>/uplink.pcapng

The above file can be opened using any packet analyser tool like Wireshark.

If the .pcap file shows ICMP Echo Requests from  the VM IP entering the vmnic as ingress traffic, the physical switch is forwarding the VM's own traffic back to it.

  • To resolve this issue collaborate with the physical Network Team to audit the RSPAN/Mirroring sessions:
    • Apply VLAN Filtering: Modify the RSPAN source to exclude high-bandwidth VLANs (vMotion, Management, and Backup/Storage).
    • Disable RSPAN Temporarily: Turn off the monitor session to verify if pings stabilize immediately.
    • Check RSPAN VLAN: Ensure the dedicated RSPAN VLAN is not accidentally trunked back to the ESXi hosts unless specifically required for a virtual sniffer VM.

Additional Information

For more information on how to perform packet captures refer: https://knowledge.broadcom.com/external/article/341568/packet-capture-on-esxi-using-the-pktcapu.html

For more inisights on RSPAN: RSPAN Cisco