Virtual Machine becomes unresponsive due to TX Hang in Physical NIC
search cancel

Virtual Machine becomes unresponsive due to TX Hang in Physical NIC

book

Article ID: 402241

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Administrators may observe one or more of the following:

  • Virtual Machines (VMs) on the affected host become inaccessible (ping/RDP fails).
  • vMotion attempts intermittently fail, particularly for active workloads, following snippets may be observed in directory of the VM.
    YYYY-MM-DDTHH:MM:SSZ In(05) vmx - Received migrate 'from' request for mid id ###################, src ip <###.###.###.###>.
    YYYY-MM-DDTHH:MM:SSZ In(05) vmx - MigrateSetInfo: state=MIGRATE_FROM_VMX_INIT srcIp=<###.###.###.###> dstIp=<###.###.###.###> mid=################### uuid=#######-####-####-####-######### priority=high
    YYYY-MM-DDTHH:MM:SSZ In(05) vmx - MigrateWaitForData: Waited for 120.03 seconds.
    YYYY-MM-DDTHH:MM:SSZ Wa(03) vmx - Migrate: timed out waiting for data from the source.
    YYYY-MM-DDTHH:MM:SSZ In(05) vmx - [msg.checkpoint.migration.noprogress] Timed out waiting for migration data.
    YYYY-MM-DDTHH:MM:SSZ In(05) vmx - [msg.moduletable.powerOnFailed] Module 'Migrate' power on failed.
    YYYY-MM-DDTHH:MM:SSZ In(05) vmx - [msg.vmx.poweron.failed] Failed to start the virtual machine.
  • Host remains reachable via ping and Host Client.
  • The NIC queues stopped transmitting packets, driver has outstanding packets in the TX (transmit) queue that haven't been acknowledged or completed by the host.
  • Network transmission stalled, despite physical link remaining up.
  • Multiple virtual NIC queues (QID) showed significant delays and unprocessed packets.
  • The host logs(/var/run/log/vmkernel.log) confirmed that the NIC driver attempted to recover by issuing force quiesce commands to the affected ports.
    Vmxnet3: Hang detected, numHangQ: ##
    Vmxnet3: #####: portID:########, QID: 0, next2TX: 457, next2Comp: 459, lastNext2TX: 15, next2Write:2, ringSize: 512 inFlight: 11, delay(ms): 6484,txStopped: 0
    NetSchedHClkPortQuiesce: received a force quiesce for port 0x######
    [msg.checkpoint.migration.noprogress] Timed out waiting for migration data.
    
  • VMware Tools heartbeats show degraded or missing signals (yellow/red states) and reported reduced heartbeat rates from multiple VMs, indicating disrupted guest-to-host communication.
    YYYY-MM-DDTHH:MM:SSZ In(166) Hostd[######]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/affected_vm's_path.vmx] Setting heartbeat to red; Heartbeat (in 30s): expected=30 (yellow<=80%, red<=40%), actual=## (##.####%)

Environment

VMware vSphere ESXi 

Cause

The issue was caused by a TX Hang condition in the physical NICs on the affected ESXi host. 

Resolution

Workaround

  • Rebooting the host may temporarily clear the NIC hang condition.

  • Avoid scheduling vMotion-intensive operations until the underlying issue is resolved.

Resolution

Engage Hardware Support Vendor to investigate hardware NIC behavior on the affected host by performing stress test on the Network cards which report "received a force quiesce for port" message.

Additional Information

This is a hardware-layer issue that must be remediated by engaging the system hardware vendor.