VMs experience full or partial loss of network connectivity on ESXi hosts using certain versions of bnxtnet drivers
search cancel

VMs experience full or partial loss of network connectivity on ESXi hosts using certain versions of bnxtnet drivers

book

Article ID: 338064

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • Virtual Machines(VMs) suddenly lose connectivity to all or some network destinations. Pings to those addresses fail.
  • Connectivity is restored by disconnecting and reconnecting the vNIC, or migrating the VM to another ESXi host. During these operations, the vmxnet3 vNIC generates a message about “hang detected" in the ESXi VMkernel logs, similar to the following:
    "Vmxnet3: 21100: vmname.eth0,xx:xx:xx:xx:xx:xx, portID(67101010): Hang detected,numHangQ: 1, enableGen: 1011"
  • The host is using bnxtnet "async" driver of version 224.0.x.x or later for the uplinks of the affected VMs.
  • The ESXi hosts may also be reporting a PSOD, and the following events are observed in the VMkernel logs:
    WARNING: CACHE_SLOW ent: #Rder:1, #Pin:0, rwOwner:2165751, roOwner:2165744, self:0x210bf7, entry:0x476bcc187660
    WARNING: ZDOMBLKCACHE: CACHE_SLOW: X Block {9fxxxxx-xxxx-xxxx-xxxx-xx44, 166, 1459309}, type: MiddleTree blocked 27280.4 sec.
    WARNING: CACHE_SLOW ent: ref:0, dirty:1, skipRd:0, hasWt:0, inIO:0, inSet:1, del:0, WriterWait:1, flush:1
    WARNING: CACHE_SLOW ent: #Rder:1, #Pin:0, rwOwner:2165751, roOwner:2165744, self:0x210bf7, entry:0x476bcc187660
    WARNING: ZDOMBLKCACHE: CACHE_SLOW: X Block {9fxxxxx-xxxx-xxxx-xxxx-xx44, 166, 1459309}, type: MiddleTree blocked 27283.4 sec.
    WARNING: CACHE_SLOW ent: ref:0, dirty:1, skipRd:0, hasWt:0, inIO:0, inSet:1, del:0, WriterWait:1, flush:1
    WARNING: CACHE_SLOW ent: #Rder:1, #Pin:0, rwOwner:2165751, roOwner:2165744, self:0x210bf7, entry:0x476bcc187660

Note:  If a PSOD is observed then the below PSOD stack is reported in the logs; however exact logging may vary.

  • The task of unsubscribing the objects task triggered the PSOD in this case. 
    2024-06-10T10:51:32.574Z cpu47:2099455)DOM: DOMOwnerUnsubscribeClusterEncrState:5934: DOM Owner on 21950e66-6c6c-8693-3e0e-bc97e1055ba0 unsubscribed cluster encryption state
2024-06-10T10:51:32.722Z cpu20:2099455)DOM: DOMOwnerUnsubscribeClusterEncrState:5925: DOM Owner on 21950e66-6c6c-8693-3e0e-bc97e1055ba0 received premature cluster encryption state unsubscription
  • PSOD Stack: 
    cpu28:2099455)@BlueScreen: 05915d66-80b3-5c28-b78e-bc97e1055ba0: Failed to wait for object exit.
cpu28:2099455)Code start: 0x420000a00000 VMK uptime: 0:09:29:58.854
cpu28:2099455)0x453ab6d9b920:[0x420000b19b5a]PanicvPanicInt@vmkernel#nover+0x202 stack: 0x420042801100
cpu28:2099455)0x453ab6d9b9d0:[0x420000b1a47c]Panic_vPanic@vmkernel#nover+0x25 stack: 0x0
cpu28:2099455)0x453ab6d9b9f0:[0x420000b32560]vmk_PanicWithModuleID@vmkernel#nover+0x41 stack: 0x453ab6d9ba50
cpu28:2099455)0x453ab6d9ba50:[0x4200030dd63d][email protected]#0.0.0.1+0xf36 stack: 0x5c
cpu28:2099455)0x453ab6d9bec0:[0x420003190c61][email protected]#0.0.0.1+0x12 stack: 0x3e0fb637fc74
cpu28:2099455)0x453ab6d9bee0:[0x420002a1ae97][email protected]#0.0.0.1+0x230 stack: 0x3e0fb63873be
cpu28:2099455)0x453ab6d9bfa0:[0x420000b3a234]vmkWorldFunc@vmkernel#nover+0x31 stack: 0x420000b3a230
cpu28:2099455)0x453ab6d9bfe0:[0x420000e2c015]CpuSched_StartWorld@vmkernel#nover+0xe2 stack: 0x0
cpu28:2099455)0x453ab6d9c000:[0x420000adbdff]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
cpu28:2099455)base fs=0x0 gs=0x420047000000 Kgs=0x0


The PSOD and TX hang both have the same trigger but each may occur independently of each other.

Environment

VMware vSphere ESXi 7.0
VMware vSphere ESXi 8.0

Cause

The Broadcom bnxtnet async driver version 224.0.x.x or later has an issue that can miss TX packet completion under certain circumstances. This could block the VM's vNIC TX queues, and thus block some or all packets leaving the vNIC.

Resolution

Broadcom has released new versions of bnxtnet and bnxtroce drivers containing the fix, starting with version 226.0.145.4-1.
Please consult the VCG (HCL) or your OEM for the driver and firmware version matching the specific NIC model.

Additional Information