ESXi Purple Screen of Death (PSOD) on Broadcom NICs due to bnxtnet driver TX timeout
search cancel

ESXi Purple Screen of Death (PSOD) on Broadcom NICs due to bnxtnet driver TX timeout

book

Article ID: 427648

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • ESXi experiences intermittent Purple Screen of Death (PSOD) events across multiple hosts.



  • This issue occurs when using Broadcom network adapters, characterized by transmission (TX) completion timeouts and uplink resets.
  • In the VMkernel logs, you may see traces similar to the following prior to the panic :

    <date> In(182) vmkernel: cpu2:2097842)Performing Live coredump: vmxnet3-initiated
    <date> In(182) vmkernel: cpu2:2097842)No disk partition configured to dump data.
    <date> In(182) vmkernel: Coredump to file: /vmfs/volumes/###############/vmkdump/#########.dumpfile.
    <date> In(182) vmkernel: cpu2:2097842)Dump: 2917: Using dump buffer size 98304
    <date> In(182) vmkernel: cpu2:2097842)Dump: 1985: DumpProgress: Faulting world regs 17
    <date> In(182) vmkernel: cpu2:2097842)Dump: 1985: DumpProgress: Vmm code/data 16
    <date> In(182) vmkernel: cpu2:2097842)Dump: 1985: DumpProgress: Vmk code/rodata/stack 15
    <date> In(182) vmkernel: cpu65:2097291)Vmxnet3: 18934: Tx completion timeout exceeded for tq 6
    <date> In(182) vmkernel: cpu68:2097291)Vmxnet3: 18934: Tx completion timeout exceeded for tq 6
    <date> In(182) vmkernel: cpu91:6559268)Vmxnet3: 18934: Tx completion timeout exceeded for tq 6
    <date> In(182) vmkernel: cpu2:2097842)Dump: 1985: DumpProgress: Vmk data/heap 14
    <date> Wa(180) vmkwarning: cpu0:2097294)WARNING: Uplink: 22063: Queue 1 of device vmnicX stuck, resetting the device
    <date> Wa(180) vmkwarning: cpu66:2097709)WARNING: bnxtnet: bnxtnet_uplink_reset:9083: [vmnicX : 0x4525003fe000] TX timeout!

Environment

VMware ESXi 8.x

Cause

The issue is caused by a firmware bug encountered with Broadcom adapters when utilizing bnxtnet drivers older than version 234.x. The specific combination of older drivers and newer firmware leads to TX completion timeouts, triggering a device reset and subsequent host failure (PSOD).

Resolution

This issue has been fixed in bnxtnet firmware version 234.1.128.0 which is shipped with driver version 234.0.159 and above.

  1. Verify the current driver and firmware versions on your ESXi host using the following CLI commands:

    • For driver: esxcli software vib list | grep bnxtnet

    • For firmware: esxcli network nic get -n vmnicX (replace X with the appropriate nic number).

  2. Install the updated driver/firmware on all nodes in the cluster.

  3. Reboot the ESXi hosts to apply the changes.

Additional Information

Always refer the VMware Compatibility Guide (VCG) before performing driver or firmware updates to ensure if the chosen versions are supported for your specific hardware model.

Similar issue:
ESXi host with Broadcom Thor NICs might encounter PSOD with "PF Exception"