SCSI Command timeouts occur from every HBA to every Infinidat array port cluster-wide on Cisco UCS servers, eventually leading to severe performance issues or an outage

Products

VMware vSphere ESXi VMware vSphere ESXi 8.0 VMware vSphere ESX 8.x

Issue/Introduction

A VCF Administrator will observe SCSI command timeouts reported by the Cisco NFNIC driver followed by aborts for those IOs:

2026-02-05T09:47:58.929Z In(182) vmkernel: cpu38:2103857)nfnic: <2>: INFO: fnic_taskMgmt: 2128: TaskMgmt abort sc->cdb: 0x88
2026-02-05T09:47:58.929Z In(182) vmkernel: cpu38:2103857)nfnic: <2>: INFO: fnic_abort_cmd: 3803: Abort cmd called for Tag: 0x1a7 issued time: 40001 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x88 sc:0x45d9fa338dc0 flags: 0x3 lun: 51 target: 0x511c0

When aborting those IOs, many (if not all) aborted commands are getting rejected by the Infinidat array:

2026-02-05T09:47:58.931Z In(182) vmkernel: cpu0:2098289)nfnic: <2>: INFO: fnic_fcpio_itmf_cmpl_handler: 2327: fcpio hdr status: FCPIO_ITMF_REJECTED
2026-02-05T09:48:10.395Z In(182) vmkernel: cpu50:2101614)nfnic: <2>: INFO: fnic_fcpio_itmf_cmpl_handler: 2327: fcpio hdr status: FCPIO_ITMF_REJECTED
2026-02-05T09:48:10.395Z In(182) vmkernel: cpu50:2101614)nfnic: <2>: INFO: fnic_fcpio_itmf_cmpl_handler: 2327: fcpio hdr status: FCPIO_ITMF_REJECTED
2026-02-05T09:48:10.396Z In(182) vmkernel: cpu50:2101614)nfnic: <2>: INFO: fnic_fcpio_itmf_cmpl_handler: 2327: fcpio hdr status: FCPIO_ITMF_REJECTED
2026-02-05T09:48:52.935Z In(182) vmkernel: cpu40:2098118)nfnic: <2>: INFO: fnic_fcpio_itmf_cmpl_handler: 2327: fcpio hdr status: FCPIO_ITMF_REJECTED
2026-02-05T09:48:59.701Z In(182) vmkernel: cpu65:2098060)nfnic: <2>: INFO: fnic_fcpio_itmf_cmpl_handler: 2327: fcpio hdr status: FCPIO_ITMF_REJECTED
2026-02-05T09:49:10.797Z In(182) vmkernel: cpu36:2101959)nfnic: <2>: INFO: fnic_fcpio_itmf_cmpl_handler: 2327: fcpio hdr status: FCPIO_ITMF_REJECTED
2026-02-05T09:49:10.797Z In(182) vmkernel: cpu36:2101959)nfnic: <2>: INFO: fnic_fcpio_itmf_cmpl_handler: 2327: fcpio hdr status: FCPIO_ITMF_REJECTED
2026-02-05T09:49:53.263Z In(182) vmkernel: cpu35:2101818)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2327: fcpio hdr status: FCPIO_ITMF_REJECTED
2026-02-05T09:50:00.368Z In(182) vmkernel: cpu14:2098289)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2327: fcpio hdr status: FCPIO_ITMF_REJECTED
2026-02-05T09:50:12.366Z In(182) vmkernel: cpu2:2101628)nfnic: <2>: INFO: fnic_fcpio_itmf_cmpl_handler: 2327: fcpio hdr status: FCPIO_ITMF_REJECTED
2026-02-05T09:50:12.366Z In(182) vmkernel: cpu2:2101628)nfnic: <2>: INFO: fnic_fcpio_itmf_cmpl_handler: 2327: fcpio hdr status: FCPIO_ITMF_REJECTED
2026-02-05T09:50:12.366Z In(182) vmkernel: cpu2:2101628)nfnic: <2>: INFO: fnic_fcpio_itmf_cmpl_handler: 2327: fcpio hdr status: FCPIO_ITMF_REJECTED
2026-02-05T09:50:12.366Z In(182) vmkernel: cpu2:2101628)nfnic: <2>: INFO: fnic_fcpio_itmf_cmpl_handler: 2327: fcpio hdr status: FCPIO_ITMF_REJECTED
2026-02-05T09:51:07.664Z In(182) vmkernel: cpu57:2101960)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2327: fcpio hdr status: FCPIO_ITMF_REJECTED

Upon closer inspection, we see these command failures are occurring from both HBAs (nfnic: <1> and nfnic: <2>), and to ALL of the Infinidat array targets:

2026-02-05T09:47:09.346Z In(182) vmkernel: cpu1:2098080)nfnic: <1>: INFO: fnic_abort_cmd: 3803: Abort cmd called for Tag: 0xfe issued time: 21824 ms CMD_STATE: FNIC_IOREQ_ABTS_PENDING CDB Opcode: 0x88 sc:0x45d9f9671b40 flags: 0x43 lun: 32 target: 0x512c0
2026-02-05T09:48:10.395Z In(182) vmkernel: cpu25:2097683)nfnic: <2>: INFO: fnic_abort_cmd: 3803: Abort cmd called for Tag: 0x61f issued time: 59497 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x88 sc:0x45b9e864c280 flags: 0x3 lun: 42 target: 0x511c0
2026-02-05T09:49:10.797Z In(182) vmkernel: cpu0:2097683)nfnic: <2>: INFO: fnic_abort_cmd: 3803: Abort cmd called for Tag: 0xee issued time: 59317 ms CMD_STATE: FNIC_IOREQ_ABTS_PENDING CDB Opcode: 0x88 sc:0x45d9f9670340 flags: 0x43 lun: 32 target: 0x512a0
2026-02-05T09:49:53.259Z In(182) vmkernel: cpu20:2103854)nfnic: <1>: INFO: fnic_abort_cmd: 3803: Abort cmd called for Tag: 0x6cb issued time: 40002 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x12 sc:0x45d9f7b4dc40 flags: 0x3 lun: 41 target: 0x511e0
2026-02-05T09:50:12.365Z In(182) vmkernel: cpu0:2098080)nfnic: <2>: INFO: fnic_abort_cmd: 3803: Abort cmd called for Tag: 0xac issued time: 10970 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x88 sc:0x45d9e47baf80 flags: 0x3 lun: 51 target: 0x51200
2026-02-05T09:51:07.660Z In(182) vmkernel: cpu4:2103854)nfnic: <1>: INFO: fnic_abort_cmd: 3803: Abort cmd called for Tag: 0x4fd issued time: 40002 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x8a sc:0x45b9c3cb8340 flags: 0x3 lun: 51 target: 0x51220

What is also present in the logs is an aggressive flood of "Power-on Reset" events reported, which is effectively LUN reset, at the same time:

2026-02-05T09:50:48.186Z In(182) vmkernel: cpu37:2098059)ScsiCore: 2000: Power-on Reset occurred on naa.6742b0f00000069900000000000#####
2026-02-05T09:50:48.235Z In(182) vmkernel: cpu36:2098059)ScsiCore: 2000: Power-on Reset occurred on naa.6742b0f0000006990000000000######
2026-02-05T09:50:48.366Z In(182) vmkernel: cpu55:2097289)ScsiCore: 2000: Power-on Reset occurred on naa.6742b0f0000006990000000000000###
2026-02-05T09:50:48.366Z In(182) vmkernel: cpu55:2097289)ScsiCore: 2000: Power-on Reset occurred on naa.6742b0f0000006990000000000######
2026-02-05T09:50:48.752Z In(182) vmkernel: cpu47:2098059)ScsiCore: 2000: Power-on Reset occurred on naa.6742b0f00000069900000000000#####
2026-02-05T09:50:48.990Z In(182) vmkernel: cpu47:2098059)ScsiCore: 2000: Power-on Reset occurred on naa.6742b0f00000069900000000000#####

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

ESXi (any version)
Cisco UCS with Cisco NFNIC driver 5.0.0.46 or newer
Infinidat Storage Array

Cause

Depending on timing, hardware disruptions (e.g. flaky SFP/cable) or storage upgrade scenarios may trigger host-side error recovery escalation in the affected environments.

With Cisco NFNIC driver 5.0.0.45 and older, the driver responds to an abort rejection with a port logout (PLOGO) followed by a port login (PLOGI), which generally causes undesirable noise as described in Broadcom KB 426298 and Cisco bug CSCwn45550. This error handling change in Cisco's NFNIC 5.0.0.46 and newer drivers causes a bigger issue with Infinidat arrays due to how they handle Device/LUN resets. Infinidat arrays may not return a SCSI status for all in-flight commands from different hosts when a LUN reset is received, as allowed by the SCSI spec (control mode page 0Ah TAS and TST bits are set to zero for Infinidat arrays).

Unfortunately, the ESXi hosts does not validate against those bits so they are not aware that the IOs were abandoned by the array since the array did not return a status for them. As such, IOs will timeout from the initiator side which will result in more aborts being issued. When the Infinidat array rejects those aborts because they abandoned the IO, the Cisco NFNIC driver responds with more device resets.

In a large enough environment with sustained IO, this looping behavior can compound on itself to the point that it could severely affect performance or cause an outage. This is why you will see a flood of "Power-on Reset" events associated with this condition.

Resolution

Cisco and Infinidat are planning code enhancements to reduce the likelihood of this situation in the future with future NFNIC and InfuzeOS releases. Customers should consult Cisco and Infinidat for definitive guidance on the relevant resolution for their environments.

Cisco has released NFNIC driver (5.0.0.51) that introduces an advanced module parameter that will allow the NFNIC driver to revert back to the LOGO/PLOGI behavior:

Here is the text from the Cisco release notes for NFNIC driver 5.0.0.51:

fnic_abort_reject_method is a module parameter that controls the recovery actions in case of an abort reject from target. Default behavior is to perform a LUN reset, to interop with Infinidat a LOGO followed by a LOGIN is required.

To send LOGO instead of LUN_RESET, please follow these steps.

1) esxcli system module parameters set -p "fnic_abort_reject_method=1" -m nfnic
2) Reboot.

Separately, Infinidat expects to release a new option that will facilitate a future alternate escalation path in a future InfuzeOS release, as well as a different enhancement to change the abort reject reason explanation codes reported in these scenarios, to avoid the likelihood of entering these kind of host escalations in the future.

Customers should consult Cisco and Infinidat for definitive guidance on the relevant resolution for their environments.

WORKAROUND

Customers currently experiencing this kind of abort storm can shut down all impacted hosts concurrently and the storm should stop. If that is not practical, then downgrading the Cisco NFNIC driver to 5.0.0.45 on all impacted hosts will remove the problematic escalation path, although that comes with tradeoffs as later NFNIC releases have important unrelated improvements and bug fixes. Note that the abort to device reset escalation must be stopped on all impacted hosts. If even one host is still escalating, the problem will likely spread to all vulnerable hosts again. This means that you cannot downgrade a single host and expect relief, the downgrade has to be across all participating hosts.

Additional Information

Each of these target values reported by Cisco's NFNIC driver translate to a WWPN for an array target:

2026-02-05T09:47:09.459Z In(182) vmkernel: cpu21:2098036)nfnic: <1>: INFO: fdls_create_tport: 2214: FDLS create tport: fcid: 0x512c0 wwpn: 0x5742b0f000####31
2026-02-05T09:48:10.396Z In(182) vmkernel: cpu33:2098043)nfnic: <2>: INFO: fdls_create_tport: 2214: FDLS create tport: fcid: 0x511c0 wwpn: 0x5742b0f000####15
2026-02-05T09:49:10.798Z In(182) vmkernel: cpu33:2098043)nfnic: <2>: INFO: fdls_create_tport: 2214: FDLS create tport: fcid: 0x512a0 wwpn: 0x5742b0f000####35
2026-02-05T09:50:00.505Z In(182) vmkernel: cpu21:2098036)nfnic: <1>: INFO: fdls_create_tport: 2214: FDLS create tport: fcid: 0x511e0 wwpn: 0x5742b0f000####11
2026-02-05T09:50:12.366Z In(182) vmkernel: cpu33:2098043)nfnic: <2>: INFO: fdls_create_tport: 2214: FDLS create tport: fcid: 0x51200 wwpn: 0x5742b0f000####25
2026-02-05T09:51:12.522Z In(182) vmkernel: cpu21:2098036)nfnic: <1>: INFO: fdls_create_tport: 2214: FDLS create tport: fcid: 0x51220 wwpn: 0x5742b0f000####21

naa.6742b0f000000699000000000000#### : NFINIDAT Fibre Channel Disk (naa.6742b0f000000699000000000000####)
vmhba2:C0:T21:L35 LUN:35 state:active fc Adapter: WWNN: 20:00:00:25:b5:##:##:## WWPN: 20:00:00:25:b5:##:##:## Target: WWNN: 57:42:b0:f0:00:##:##:00 WWPN: 57:42:b0:f0:00:##:##:25
vmhba1:C0:T74:L35 LUN:35 state:active fc Adapter: WWNN: 20:00:00:25:b5:##:##:## WWPN: 20:00:00:25:b5:##:##:## Target: WWNN: 57:42:b0:f0:00:##:##:00 WWPN: 57:42:b0:f0:00:##:##:11
vmhba2:C0:T20:L35 LUN:35 state:active fc Adapter: WWNN: 20:00:00:25:b5:##:##:## WWPN: 20:00:00:25:b5:##:##:## Target: WWNN: 57:42:b0:f0:00:##:##:00 WWPN: 57:42:b0:f0:00:##:##:35
vmhba2:C0:T19:L35 LUN:35 state:active fc Adapter: WWNN: 20:00:00:25:b5:##:##:## WWPN: 20:00:00:25:b5:##:##:## Target: WWNN: 57:42:b0:f0:00:##:##:00 WWPN: 57:42:b0:f0:00:##:##:15
vmhba1:C0:T73:L35 LUN:35 state:active fc Adapter: WWNN: 20:00:00:25:b5:##:##:## WWPN: 20:00:00:25:b5:##:##:## Target: WWNN: 57:42:b0:f0:00:##:##:00 WWPN: 57:42:b0:f0:00:##:##:31
vmhba1:C0:T72:L35 LUN:35 state:active fc Adapter: WWNN: 20:00:00:25:b5:##:##:## WWPN: 20:00:00:25:b5:##:##:## Target: WWNN: 57:42:b0:f0:00:##:##:00 WWPN: 57:42:b0:f0:00:##:##:21

This confirms that every HBA and every array target is affected.

You can also expect to see "Power-on Reset" events start slow and then rapidly ramp up in frequency as more SCSI command timeouts, aborts, and abort rejections occur of the continued device reset behavior:

2026-02-05T09:50:48.186Z In(182) vmkernel: cpu37:2098059)ScsiCore: 2000: Power-on Reset occurred on naa.6742b0f00000069900000000000#####
2026-02-05T09:50:48.235Z In(182) vmkernel: cpu36:2098059)ScsiCore: 2000: Power-on Reset occurred on naa.6742b0f0000006990000000000######
2026-02-05T09:50:48.366Z In(182) vmkernel: cpu55:2097289)ScsiCore: 2000: Power-on Reset occurred on naa.6742b0f0000006990000000000000###
2026-02-05T09:50:48.366Z In(182) vmkernel: cpu55:2097289)ScsiCore: 2000: Power-on Reset occurred on naa.6742b0f0000006990000000000######
2026-02-05T09:50:48.752Z In(182) vmkernel: cpu47:2098059)ScsiCore: 2000: Power-on Reset occurred on naa.6742b0f00000069900000000000#####
2026-02-05T09:50:48.990Z In(182) vmkernel: cpu47:2098059)ScsiCore: 2000: Power-on Reset occurred on naa.6742b0f00000069900000000000#####

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.