Cisco UCS servers log FNIC aborts in ESXi
search cancel

Cisco UCS servers log FNIC aborts in ESXi

book

Article ID: 381648

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

FNIC aborts on a Cisco server typically indicates an issue with the Fibre Channel (FC) or Fibre Channel over Ethernet (FCoE) connectivity.

Ethernet NIC (eNIC) is the driver for the Cisco UCS VIC Ethernet NIC and Fibre Channel NIC (fNIC) is the driver for the Cisco UCS VIC Fibre Channel over Ethernet HBA or Fiber Channel. enic/fnic drivers are used by ESXi to communicate with the physical HBA/Converged Network Adapter (CNA) installed on the server. These drivers enable FCoE or FC communication between the ESXi host and the target array. 

Issue Validation:

  • Below error is seen in /var/log/vmkwarning.log

YYYY-MM-DDTHH:MM Wa(10) vmkwarning: cpu32:2097613)WARNING: ScsiPath: 9263: Failed TaskMgmt abort for cmdId.initiator=0x4309c5e9d00 CmdSN 0x1f50, status Status pending, path vmhba0:C0:T2:L1
YYYY-MM-DDTHH:MM Wa(10) vmkwarning: cpu32:2097613)WARNING: ScsiPath: 9263: Failed TaskMgmt abort for cmdId.initiator=0x4309c918d40 CmdSN 0x266, status Status pending, path vmhba0:C0:T4:L3
YYYY-MM-DDTHH:MM Wa(10) vmkwarning: cpu32:2097613)WARNING: ScsiPath: 9263: Failed TaskMgmt abort for cmdId.initiator=0x4309c5e9d00 CmdSN 0x20e4, status Status pending, path vmhba0:C0:T3:L1
YYYY-MM-DDTHH:MM Wa(10) vmkwarning: cpu32:2097613)WARNING: ScsiPath: 9263: Failed TaskMgmt abort for cmdId.initiator=0x4309cb4c9c0 CmdSN 0x51, status Status pending, path vmhba0:C0:T3:L8

YYYY-MM-DDTHH:MM cpu34:2097605)WARNING: nfnic: <1>: fnic_abort_cmd: 3826: Abort for cmd tag: 0x2db in pending state
YYYY-MM-DDTHH:MM cpu34:2097605)WARNING: nfnic: <1>: fnic_abort_cmd: 3826: Abort for cmd tag: 0x2f6 in pending state
YYYY-MM-DDTHH:MM cpu49:2117608)WARNING: nfnic: <1>: fnic_abort_cmd: 3826: Abort for cmd tag: 0x224 in pending state
YYYY-MM-DDTHH:MM cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3826: Abort for cmd tag: 0x370 in pending state
YYYY-MM-DDTHH:MM cpu34:2097605)WARNING: nfnic: <1>: fnic_abort_cmd: 3826: Abort for cmd tag: 0x3c4 in pending state

YYYY-MM-DDTHH:MM cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3822: Abort for cmd tag: 0x3c4 already issued
YYYY-MM-DDTHH:MM cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3822: Abort for cmd tag: 0x3c4 already issued
YYYY-MM-DDTHH:MM cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3822: Abort for cmd tag: 0x3c4 already issued
YYYY-MM-DDTHH:MM cpu32:2097604)WARNING: nfnic: <1>: fnic_abort_cmd: 3822: Abort for cmd tag: 0x3c4 already issued

  • Also, below error is seen in /var/log/vmkernel.log,

YYYY-MM-DDTHH:MM Wa(180) vmkwarning: cpu47:2097613)WARNING: nfnic: <1>: fnic_abort_cmd: 3818: Abort for cmd tag: 0x36e in pending state
YYYY-MM-DDTHH:MM In(182) vmkernel: cpu53:2097612)nfnic: <1>: INFO: fnic_taskMgmt: 2128: TaskMgmt abort sc->cdb: 0xf1
YYYY-MM-DDTHH:MM In(182) vmkernel: cpu53:2097612)nfnic: <1>: INFO: fnic_abort_cmd: 3803: Abort cmd called for Tag: 0x36e  issued time: 13003 ms CMD_STATE: FNIC_IOREQ_ABTS_PENDING CDB Opcode: 0xf1  sc:0x45d993bf2cc0 flags: 0x43 lun: 1 target: 0x20b60
YYYY-MM-DDTHH:MM Wa(180) vmkwarning: cpu53:2097612)WARNING: nfnic: <1>: fnic_abort_cmd: 3814: Abort for cmd tag: 0x36e already issued
YYYY-MM-DDTHH:MM In(182) vmkernel: cpu53:2097612)nfnic: <1>: INFO: fnic_taskMgmt: 2128: TaskMgmt abort sc->cdb: 0xf1
YYYY-MM-DDTHH:MM In(182) vmkernel: cpu53:2097612)nfnic: <1>: INFO: fnic_abort_cmd: 3803: Abort cmd called for Tag: 0x36e  issued time: 13003 ms CMD_STATE: FNIC_IOREQ_ABTS_PENDING CDB Opcode: 0xf1  sc:0x45d993bf2cc0 flags: 0x43 lun: 1 target: 0x20b60
YYYY-MM-DDTHH:MM Wa(180) vmkwarning: cpu53:2097612)WARNING: nfnic: <1>: fnic_abort_cmd: 3814: Abort for cmd tag: 0x36e already issued
YYYY-MM-DDTHH:MM In(182) vmkernel: cpu53:2097612)nfnic: <1>: INFO: fnic_taskMgmt: 2128: TaskMgmt abort sc->cdb: 0xf1
YYYY-MM-DDTHH:MM In(182) vmkernel: cpu53:2097612)nfnic: <1>: INFO: fnic_abort_cmd: 3803: Abort cmd called for Tag: 0x36e  issued time: 13003 ms CMD_STATE: FNIC_IOREQ_ABTS_PENDING CDB Opcode: 0xf1  sc:0x45d993bf2cc0 flags: 0x43 lun: 1 target: 0x20b60

Environment

VMware vSphere ESXi 7.x

VMware vSphere ESXi 8.x

Cause


1. FC/FCoE connectivity issues

2. Faulty or loose FC/FCoE cables

3. Incompatible or outdated HBA (Host Bus Adapter) driver/firmware

4. Server or switch configuration errors

5. Over-subscription or congestion on the FC/FCoE network

6. Hardware failures (e.g., FC/FCoE adapter, switch, or storage array)

Resolution


1. Upload ESXi support bundles from all hosts in the cluster to the support case to conduct a log review by support. Mention the host name of the server/s you are seeing these errors on in the ticket. Support will analyze the logs on all the hosts to find out if we are seeing fnic aborts on a single host or all of them. This is the key in isolating fnic abort scenarios. Performing extensive log reviews on multiple hosts is a time consuming task. We request you to provide us a minimum of 48 business hours to perform thorough analysis before sending a log report. 

2. Check the HBA driver/firmware versions on all the hosts in the cluster. It's strongly recommended to run on the same version of driver/firmware on all the HBAs of a cluster. Even one odd host in a cluster running a different version of driver/firmware in contrast to others can lead to performance anomalies and instability in the cluster.

3. Review the Switches, Fabric Interconnect and storage array logs for any significant errors related to the aborts. 

4. Analyze FC/FCoE network topology to see if you are witnessing a change in the usual performance parameters. 

5. Run diagnostic tests on the Cisco UCS Virtual Interface Card (VIC) to check if there's any hardware failure 

6. Run diagnostic tests on the switches to check if there are any port failures and perform a general health check to see whether any of the ports are misbehaving. 

A Cisco fnic abort issue needs to be triaged from multiple perspectives (vendors) including the switches, fabric interconnects, storage array and the peripherals like cables and other devices (their configurations) to understand where the problem lies. From ESXi's standpoint we have very limited visibility into what's going on, on the other devices which causes ESXi to generate such responses therefore the root cause of the problem requires thorough analysis and investigation from multiple angles to effectively address the issue.

Hardware manufacturers like QLogic, Emulex, Brocade & Broadcom to name a few test their products (NIC/HBA/CNA) with specific versions of driver and firmware that is compatible with each other. When you are patching, updating or upgrading your ESXi host, you may obliviously ignore to think about this vital piece of information and later hit a roadblock after the upgrade is complete. 

Drivers is a software that enables your hardware devices to communicate with your operating system and applications. If your drivers are not compatible with your current system, you might experience errors, crashes, or reduced functionality. 

Firmware is a software that controls the basic functions of your hardware devices, such as booting, loading & other operations. 

If your driver or firmware is incompatible or outdated, you may encounter problems with the hardware or within the operating system (ESXi). And because these devices connect to your network or storage they can have a significant performance impact on your virtual machines overall performance. Therefore, its very critical to check the below information before patching your hosts. 

1. Host hardware compatibility with ESXi version. Contact your server vendor or look up this information on their website. 

2. Driver & firmware compatibility with ESXi version. Look up Broadcom Compatibility Guide to find this information. 

It's also harmless to upgrade to the highest version of firmware available for your adapters, as long as you are willing to apply this change to a single server and monitor it for a few days before applying it to the cluster.

CAUTION: Flashing the BIOS on a HBA can sometimes lead to unknown hardware problems with the adapter if it's very old or even require a replacement if it kills the hardware. Please proceed to do this update under the supervision of your server manufacturer to be on the safer side.

Additional Information