ESXi hosts running on Cisco UCS becomes unresponsive and the VMs become unmanageable
search cancel

ESXi hosts running on Cisco UCS becomes unresponsive and the VMs become unmanageable

book

Article ID: 380162

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • ESXi hosts running on Cisco UCS hardware becomes unresponsive. 
  • Virtual Machines are either hung or unmanageable. 
  • ESXi host needs to be rebooted to get it back to working. 
  • vmkernel logs show command being cancelled and aborted by the fnic driver.

vmkernel: cpu38:2097280)ScsiDeviceIO: 4656: Cmd(0x45b9a87d7d80) 0x28, cmdId.initiator=0x43094bdb6a80 CmdSN 0xac0e3 from world 0 to dev "naa.xxxxxxxxxxxxxxxxxxxxx" failed H:0x5 D:0x0 P:0x0 Cancelled from device layer. Cmd

  • The below logs excerpts indicates congestion or frames drops at the fabric layer. 

vmkwarning: cpu0:2097683)WARNING: nfnic: <4>: fnic_abort_cmd: 3869: Abort for cmd tag: 0x6c7 completed
vmkernel: cpu0:2097683)nfnic: <4>: INFO: fnic_abort_cmd: 3862: Abort cmd called for Tag: 0x6c8  issued time: 468575472 ms CMD_STATE: FNIC_IOREQ_ABTS_COMPLETE CDB Opcode: 0x2a  sc:0x45b9cc4b2300 flags: 0x273 lun: 0 target: 0x150440
vmkwarning: cpu0:2097683)WARNING: nfnic: <4>: fnic_abort_cmd: 3869: Abort for cmd tag: 0x6c8 completed
vmkernel: cpu0:2097683)nfnic: <4>: INFO: fnic_taskMgmt: 2227: TaskMgmt: virt reset for CmdInitiator: 0x430e059d8500 Aborted :2 cmds
vmkernel: cpu0:2097683)nfnic: <4>: INFO: fnic_abort_cmd: 3862: Abort cmd called for Tag: 0x6c7  issued time: 468575472 ms CMD_STATE: FNIC_IOREQ_ABTS_COMPLETE CDB Opcode: 0x2a  sc:0x45b9a868f580 flags: 0x273 lun: 0 target: 0x150420

Environment

VMware vSphere ESXi 7.x

VMware vSphere ESXi 8.x

Cause

This occurs if "slow-drain" and "PFC Watchdog" are enabled at the same time in the fabric. 

Note: Ensure there is no fault at the physical layer. 

Resolution

Validate there is no fault at the physical layer. 

As per Cisco Bug CSCwh75018 enabling both Slow-drain and PFC watchdog is not supported.

Note : "slow-drain" feature has been replaced by PFC Watchdog feature in 4.2(1) and higher.