VMs lose network connectivity on ESXi hosts with Ntg3 driver(4.1.9.0) due to TX hang between Ntg3XmitPktList and Ntg3TxCompletion.

search cancel

VMs lose network connectivity on ESXi hosts with Ntg3 driver(4.1.9.0) due to TX hang between Ntg3XmitPktList and Ntg3TxCompletion.

book

Article ID: 370372

calendar_today

Updated On: 04-24-2025

Products

VMware vSphere ESXi

Issue/Introduction

Virtual Machines(VMs) suddenly lose connectivity to all or some network destinations. Pings to those addresses fail.
During VM operation, the vmxnet3 vNIC generates a message about “hang detected" in the ESXi kernel logs, similar to the following:"Vmxnet3: 21228: vmname,##:##:##:##:##:##, portID(1341010101): Hang detected,numHangQ: 4, enableGen: 9218"
the following errors were logged in vmkernel.logs "WARNING: Uplink: 21014: Queue 0 of device vmnicX stuck, resetting the device"
Connectivity is restored by migrating the network of impacted VMs to another vmnic on the same/different host)
Flapping the vmnic link UP/Down does not help.

Environment

VMware vSphere ESXi 7.0.x
Ntg3 driver version - 4.1.9.0

Cause

It appears the issue (TX hang) is caused by a rare data race in ntg3 driver between Ntg3XmitPktList and Ntg3TxCompletion.
It requires Ntg3TxCompletion to mark the completion of the entire TXQ (e.g. from almost full to empty) within a very narrow window of Ntg3XmitPktList when it finds that the TXQ is full.

Resolution

Contact hardware vendor - The fix will be included in the next latest ntg3 driver version.

Feedback

thumb_up Yes

thumb_down No