Host freezes after logging fnic driver failing with abort commands
search cancel

Host freezes after logging fnic driver failing with abort commands

book

Article ID: 308284

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • Operating System logs have a large amount of persistent SCSI abort commands for FCID 0xffffffff
  • Operating Systems utilizing remote FC / FCoE storage may hang



Environment

VMware vSphere ESXi 6.x

Cause

Any of the following:
Cisco UCS VIC 1340 modular LOM (UCSB-MLOM-40G-031)
Cisco UCS VIC 1380 mezzanine adapter (UCS-VIC-M83-8P)
Cisco UCS VIC 1385 Dual Port 40Gb QSFP+ CAN (UCSC-PCIE-C40Q-03)
Cisco UCS VIC 1387 Dual Port 40Gb QSFP CNA MLOM (UCSC-MLOM-C40Q-03)

FC/FCoE Storage - significant traffic on FC LUNs

VIC adapter firmware 4.2(3a)
4.2(3a) VIC firmware is included with the 3.2(3a) UCS B-Series and C-Series software bundles.
4.2(3a) VIC firmware is included with the 3.1(3a) and 3.1(3b) UCS Standalone C-Series bundles.

Prior to hang/aborts we see an entry in VIC logs removing an FC mac address such as:
mcp.vnic_dev vnic17 vnic_dev_addr_del <mac>

NO ABORTS are seen at the VIC level
At the OS, in vmkernel logs we will see repeated aborts. All aborts while in this condition will show an FCID of 0xffffffff
vmkernel: cpu33:397265)<7>fnic : 2 :: Abort Cmd called FCID 0xffffffff, LUN 0x1 TAG 5a flags 843

Resolution

Currently there is no resolution

Workaround:
To workaround the issue, Downgrade UCS VIC firmware.

For Cisco UCS B-Series Servers and Integrated C-Series Rack-Mount Servers, downgrade to a release prior to 3.2(3a)

For Cisco UCS C-Series Standalone Rack-Mount Servers, downgrade to a release prior to 3.1(3a)

Additional Information

Impact/Risks:
The host may hang and not recover without intervention. ESXi does NOT generate a PSOD.