VMs lose network connectivity on ESXi hosts with Ntg3 driver(4.1.9.0) due to TX hang between Ntg3XmitPktList and Ntg3TxCompletion.
searchcancel
VMs lose network connectivity on ESXi hosts with Ntg3 driver(4.1.9.0) due to TX hang between Ntg3XmitPktList and Ntg3TxCompletion.
book
Article ID: 370372
calendar_today
Updated On: 04-24-2025
Products
VMware vSphere ESXi
Issue/Introduction
Virtual Machines(VMs) suddenly lose connectivity to all or some network destinations. Pings to those addresses fail.
During VM operation, the vmxnet3 vNIC generates a message about “hang detected" in the ESXi kernel logs, similar to the following:"Vmxnet3: 21228: vmname,##:##:##:##:##:##, portID(1341010101): Hang detected,numHangQ: 4, enableGen: 9218"
the following errors were logged in vmkernel.logs "WARNING: Uplink: 21014: Queue 0 of device vmnicX stuck, resetting the device"
Connectivity is restored by migrating the network of impacted VMs to another vmnic on the same/different host)
Flapping the vmnic link UP/Down does not help.
Environment
VMware vSphere ESXi 7.0.x Ntg3 driver version - 4.1.9.0
Cause
It appears the issue (TX hang) is caused by a rare data race in ntg3 driver between Ntg3XmitPktList and Ntg3TxCompletion. It requires Ntg3TxCompletion to mark the completion of the entire TXQ (e.g. from almost full to empty) within a very narrow window of Ntg3XmitPktList when it finds that the TXQ is full.
Resolution
Contact hardware vendor - The fix will be included in the next latest ntg3 driver version.