VMs lose network connectivity on ESXi hosts with Ntg3 driver due to TX hang between Ntg3XmitPktList and Ntg3TxCompletion.
search cancel

VMs lose network connectivity on ESXi hosts with Ntg3 driver due to TX hang between Ntg3XmitPktList and Ntg3TxCompletion.

book

Article ID: 370372

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • Virtual Machines(VMs) suddenly lose connectivity to all or some network destinations. Pings to those addresses fail.
  • During VM operation, the vmxnet3 vNIC generates a message about “hang detected" in the /var/run/log/vmkernel.log , similar to the following:
    "Vmxnet3: 21228: vmname,##:##:##:##:##:##, portID(xxxxxxxx): Hang detected,numHangQ: 4, enableGen: 9218"
    "WARNING: Uplink: 2101#: Queue 0 of device vmnicX stuck, resetting the device"
  • Connectivity is restored by migrating the network of impacted VMs to another vmnic on the same/different host
  • Flapping the vmnic link UP/Down does not help.

Environment

VMware vSphere ESXi 7.0.x 
VMware vSphere ESXi 8.0.x 

Cause

It appears the issue (TX hang) is caused by a rare data race in ntg3 driver between Ntg3XmitPktList and Ntg3TxCompletion
It requires Ntg3TxCompletion to mark the completion of the entire TXQ (e.g., from almost full to empty) within a very narrow window of Ntg3XmitPktList when it finds that the TXQ is full.

Resolution

This is a known issue impacting VMware ESXi hosts, and the fix is with the inbox driver ntg3 version 4.1.15 

This issue is resolved in VMware vSphere ESXi 7.0 Update 3vESXi 8.0 Update 3e, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.