ESXi host with Broadcom Thor NICs might encounter PSOD with "PF Exception"
search cancel

ESXi host with Broadcom Thor NICs might encounter PSOD with "PF Exception"

book

Article ID: 418483

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • An ESXi host equipped with a bnxtnet driver earlier than 234.x can experience a PSOD with "PF Exception 14" error as mentioned below:

  • Prior to the panic, the ESXi host's vmkernel.log displays entries similar to the following :

/var/run/log/vmkernel.log

YYYY-MM-DDThh:mm:ss In(182) vmkernel: cpu11:2099763)Performing Live coredump: vmxnet3-initiated
YYYY-MM-DDThh:mm:ss In(182) vmkernel: cpu11:2099763)No disk partition configured to dump data.
YYYY-MM-DDThh:mm:ss In(182) vmkernel: Coredump to file: /vmfs/volumes/<datastore>/vmkdump/<dump_file_name>.dumpfile.
YYYY-MM-DDThh:mm:ss In(182) vmkernel: cpu11:2099763)Dump: 291#: Using dump buffer size 98304
YYYY-MM-DDThh:mm:ss In(182) vmkernel: cpu0:2099763)Dump: 198#: DumpProgress: Faulting world regs ##
YYYY-MM-DDThh:mm:ss In(182) vmkernel: cpu2:2099763)Dump: 198#: DumpProgress: Vmm code/data ##
YYYY-MM-DDThh:mm:ss In(182) vmkernel: cpu2:2099763)Dump: 198#: DumpProgress: Vmk code/rodata/stack ##
YYYY-MM-DDThh:mm:ss In(182) vmkernel: cpu272:2097675)Vmxnet3: 188##: Tx completion timeout exceeded for tq 0
YYYY-MM-DDThh:mm:ss In(182) vmkernel: cpu272:2097675)Vmxnet3: 188##: Tx completion timeout exceeded for tq 1
YYYY-MM-DDThh:mm:ss In(182) vmkernel: cpu272:2097675)Vmxnet3: 188##: Tx completion timeout exceeded for tq 2
YYYY-MM-DDThh:mm:ss In(182) vmkernel: cpu272:2097675)Vmxnet3: 188##: Tx completion timeout exceeded for tq 3
YYYY-MM-DDThh:mm:ss In(182) vmkernel: cpu2:2099763)Dump: 198#: DumpProgress: Vmk data/heap 14
YYYY-MM-DDThh:mm:ss Wa(180) vmkwarning: cpu261:2097678)WARNING: Uplink: 220##: Queue 3 of device vmnic2 stuck, resetting the device
YYYY-MM-DDThh:mm:ss Wa(180) vmkwarning: cpu8:2099245)WARNING: bnxtnet: bnxtnet_uplink_reset:908#: [vmnic2 : 0x4522c##d0000] TX timeout!
YYYY-MM-DDThh:mm:ss In(182) vmkernel: cpu4:29505145)Vmxnet3: 19##4: <vm_name>.eth0,00:##:##:##:##:ed, portID(67####87): Hang detected,numHangQ: 8, enableGen: 75

  • The issue is intermittent and is only encountered on ESXi hosts equipped with Broadcom bnxtnet Thor 200G and 400G NICs. These specific devices can be identified by the following PCI IDs:

    • Vendor ID (VID): 14e4

    • Device ID (DID): 175# or 176# (e.g., 1750, 1761)

To check for the VID and DID, please run the below command:

# vmkchdev -l |grep vmnic

For more details to identify the driver details, please refer Determining Network/Storage firmware and driver version in ESXi

Environment

vSphere ESXi

Cause

This is due to a known firmware issue encountered with Broadcom Thor NICs having bnxtnet drivers with version earlier than 234.x .

Resolution

This issue has been resolved in Bnxtnet firmware version 234.1.128.0 which is shipped with driver version 234.0.159.1. 

Hence kindly update the Bnxtnet driver to 234.0.159.1 or newer on the affected ESXi hosts. To download the same ,refer Broadcom Compatibility Guide