Random nic's on random servers become unresponsive due to nic firmware crashing
search cancel

Random nic's on random servers become unresponsive due to nic firmware crashing

book

Article ID: 421287

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • The ESXi hosts frequently go to 'Not Responding' state and report NIC firmware crashes

  • Nics are using bnxtnet driver/firmware
  • Below are the logs from one of the hosts FYR:

    <Date> Wa(180) vmkwarning: cpu##:20###43)WARNING: bnxtnet: hwrm_send_msg:388: [vmnicX : 0x452#####4000] HWRM cmd resp_len timeout, cmd_type 0x0(HWRM_VER_GET) seq 25146
    <Date> Wa(180) vmkwarning: cpu##:20###43)WARNING: bnxtnet: hwrm_get_version:3140: [vmnicX : 0x452#####4000] VER_GET failed- FW_STATUS_REG: 0x88901
    <Date> Wa(180) vmkwarning: cpu##:20###43)WARNING: bnxtnet: hwrm_snd_fw_msg:585: [vmnicX : 0x452#####4000] Looks like FW is crashed/non-responsive.
    <Date> Wa(180) vmkwarning: cpu##:20###43)WARNING: bnxtnet: hwrm_snd_fw_msg:587: [vmnicX : 0x452#####4000] Dumping FW trace and reporting link down to OS

  • You also observe, LRO aborts rx recorded on the vmnics

Environment

VMware vSphere ESXi

Cause

If the network interface card fails, the ESXi host may experience a temporary halt in network traffic. In such cases, the management VMkernel interface could be left without any operational uplinks, depending on the setup, which may result in an outage.

Resolution

A reboot may help in restoring the server to normal state. 

Hardware vendor (in this case - Broadcom Hardware Team) must be engaged for further troubleshooting/investigation related to network interface card failure. 
If any assistance needed to implement changes on ESXi post hardware vendor recommendation, feel free to open a case with Broadcom Software/VMware Support Team.

Note: It is a best practice to keep the network interface card firmware/driver versions up-to-date. Please use Hardware Compatibility Guide to understand the supportability and availability of firmware/drivers for the network interface card

Additional Information

Similar issues:

Warning: Looks like FW is crashed/non-responsive
Performance drops on BCM5741x NICs with GENEVE traffic
Wrong GENEVE inner checksum from NIC firmware on ESXi host with BCM5741x / Broadcom 5741x NIC.