Broadcom NIC Firmware Crash resulting in Link Down (Error 0x89021)
search cancel

Broadcom NIC Firmware Crash resulting in Link Down (Error 0x89021)

book

Article ID: 426419

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article describes a scenario where a host experiences a loss of network connectivity on specific interfaces (e.g., vmnic#). The network adapter becomes unresponsive, causing the driver to mark the links as "down."

During this event, kernel logs typically indicate that the driver stopped receiving responses from the firmware. Initially, commands used to gather port statistics (HWRM_FUNC_QSTATS) will time out:

####-##-##T##:##:##.###Z Wa(###) vmkwarning: cpu##:#######)WARNING: bnxtnet: hwrm_send_msg:###: [vmnic# : 0x############] HWRM cmd resp_len timeout, cmd_type 0x##(HWRM_FUNC_QSTATS) seq #####

When the driver subsequently attempts to probe the firmware status, it receives a specific error code (0x89021), confirming that the device firmware has crashed

####-##-##T##:##:##.###Z Wa(###) vmkwarning: cpu##:#######)WARNING: bnxtnet: hwrm_get_version:####: [vmnic# : 0x############] VER_GET failed- FW_STATUS_REG: 0x89021

Environment

VMware vSphere ESXi

Cause

The root cause is identified as a TCAM parity error occurring within the RE CFA (Complex Flow Accelerator) of the network card's firmware. This specific condition is confirmed by firmware trace dumps, which log a "CRT FATAL ERROR" alongside the crash event

####-##-##T##:##:##Z In(###) vmkernel: ####.#:D:Register re_cfa_int_sts_0:0x########: 0x9021
####-##-##T##:##:##Z In(###) vmkernel: ####.#:D:CRT FATAL ERROR: 0x9021

This error is typically an intermittent "soft error" caused by environmental factors (such as random alpha particles flipping a bit in the TCAM memory). In rare cases, it can indicate a physical hardware defect if the issue occurs repeatedly.

Resolution

To restore connectivity, perform the following steps:

  • Perform a Cold Reboot: Initiate a full host power cycle (shutdown and then power on). A simple restart of the driver is often insufficient; a full power cycle is required to restart the NIC firmware completely and clear the error bits.

  • Monitor for Recurrence: Once the host is back online, monitor the affected interfaces.
    • If the issue is resolved: No further action is required, as the error was likely a transient soft error.
    • If the issue persists: If the crash reoccurs shortly after the reboot, it indicates that the hardware is defective and the network device must be replaced.

Additional Information

Japanese KB: Broadcom NIC ファームウェアのクラッシュによるリンクダウン(エラー 0x89021)