All storage paths fail on a Cisco ESXi host due to an fnic Firmware/Adapter issue, resulting in an HA VM failover
search cancel

All storage paths fail on a Cisco ESXi host due to an fnic Firmware/Adapter issue, resulting in an HA VM failover

book

Article ID: 435482

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESX 8.x

Issue/Introduction

  • The ESXi host unexpectedly loses access to all paths connecting to SAN-attached storage devices.

  • Virtual Machines residing on the affected host are restarted by a vSphere High Availability (HA) event. The VMs are restarted on other healthy hosts in the cluster.

  • Host connectivity to storage device lost and All Paths Down [APD] events are seen for multiple datastores on a single host. 

  • The issue is isolated to storage traffic passing through the Cisco CNA utilizing the nfnic driver.

  • Storage Paths to SAN-attached datastores recover after a few reboots of the server.

 

Environment

VMware vSphere ESXi 8.x

Cause

The path failures are due to the fnic driver receiving a notification of a firmware hang for the fnic.

ESXi:
/var/log/vmkernel.log 
####-##-##T##:##:##.###Z Wa(180) vmkwarning: cpu0:2098496)WARNING: nfnic.wq_hang2: <2>: fnic_fcpio_hang_notify_handler: 1086: Received hang notify
####-##-##T##:##:##.###Z In(182) vmkernel: cpu0:2098496)nfnic.wq_hang2: <2>: INFO: fnic_log_cp_wq_indices_credits: 2688: cp_wq posted index: 9 fetch index: 9
####-##-##T##:##:##.###Z In(182) vmkernel: cpu0:2098496)nfnic.wq_hang2: <2>: INFO: fnic_log_cp_wq_indices_credits: 2691: cp_wq intr credits[0] 0
####-##-##T##:##:##.###Z In(182) vmkernel: cpu0:2098496)nfnic.wq_hang2: <2>: INFO: fnic_log_cp_wq_indices_credits: 2691: cp_wq intr credits[1] 0
####-##-##T##:##:##.###Z In(182) vmkernel: cpu0:2098496)nfnic.wq_hang2: <2>: INFO: fnic_log_cp_wq_indices_credits: 2691: cp_wq intr credits[2] 1
####-##-##T##:##:##.###Z In(182) vmkernel: cpu0:2098496)nfnic.wq_hang2: <2>: INFO: fnic_log_cp_wq_indices_credits: 2691: cp_wq intr credits[3] 0
####-##-##T##:##:##.###Z In(182) vmkernel: cpu0:2098496)nfnic.wq_hang2: <2>: INFO: fnic_log_raw_wq_indices: 2676: raw wq posted index: 25 fetch index: 25
####-##-##T##:##:##.###Z Wa(180) vmkwarning: cpu0:2098496)WARNING: nfnic.wq_hang2: <2>: fnic_fcpio_cmpl_handler: 2757: received hang notify from firmware
####-##-##T##:##:##.###Z In(182) vmkernel: cpu0:2099194)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x8a (0x45bf300d63c0, 2230569) to dev "naa.########################" on path "vmhba1:C0:T#:L#" Failed:
####-##-##T##:##:##.###Z In(182) vmkernel: cpu0:2099194)NMP: nmp_ThrottleLogForDevice:3898: H:0x1 D:0x0 P:0x0 . Act:FAILOVER. cmdId.initiator=0x43152ba131c0 CmdSN 0x38e
####-##-##T##:##:##.###Z Wa(180) vmkwarning: cpu0:2099194)WARNING: NMP: nmp_DeviceRetryCommand:130: Device "naa.########################": awaiting fast path state update for failover with I/O blocked
####-##-##T##:##:##.###Zxists on the device.
####-##-##T##:##:##.###Z In(182) vmkernel: cpu0:2099194)NMP: nmp_ThrottleLogForDevice:3825: last error status from device naa.######################## repeated 10 times
####-##-##T##:##:##.###Z In(182) vmkernel: cpu0:2099194)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x2a (0x45bf30026bc0, 2230569) to dev "naa.########################" on path "vmhba1:C0:T#:L#" Failed:
####-##-##T##:##:##.###Z In(182) vmkernel: cpu0:2099194)NMP: nmp_ThrottleLogForDevice:3898: H:0x1 D:0x0 P:0x0 . Act:FAILOVER. cmdId.initiator=0x43152ba7d0c0 CmdSN 0x34b

Resolution

  • Verify that the driver and firmware applied to the host for the fnic is supported as per the Broadcom Compatibility guide
  • Also contact Cisco to investigate the hardware and the reason for the firmware hang.