"Path redundancy to storage device degraded" alerts intermittently received in Cisco UCS environment.

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

"Path redundancy to storage device degraded" alerts are intermittently received in ESXi/ UCS environments using an NFNIC driver. No impact on the running VMs as existing paths toggle from H:0x1/ NO CONNECT to Active almost immediately.
New datastore creation on UCS host fails with the error: An error occurred during host configuration: Operation failed, diagnostics report: Unable to create Filesystem, please see VMkernel log for more details: Failed to create VMFS on device naa.##########################:#
Multiple instances of fnic_abort_cmd noted in the /var/run/log/vmkernel.log in the following pattern:
- NFNIC driver attempts to abort pending commands.
  - YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu12:2097611)nfnic: <1>: INFO: fnic_abort_cmd: 3810: Abort cmd called for Tag: 0x561 issued time: 0 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x8a sc:0x45d98f31a5c0 flags: 0x3 lun: 3 target: 0x790060
    YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu12:2097611)WARNING: nfnic: <1>: fnic_abort_cmd: 3825: Abort for cmd tag: 0x561 in pending state
- One command with tag 0x561 replied with FCPIO_ITMF_REJECTED. That means the storage array doesn't abort this command.
  - YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu42:2097998)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2332: fcpio hdr status: FCPIO_ITMF_REJECTED <---- abort rejected for tag 0x561
    YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu42:2097998)WARNING: nfnic: <1>: fnic_fcpio_itmf_cmpl_handler: 2363: abort reject received id: 0x561
- The driver reacts by trying a fabric logout. The logout operation forces the previously attempted command abort to complete.
  - YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu42:2097998)nfnic: <1>: INFO: fnic_handle_itmf_reject: 2218: Abort Rejected ! sending TGT_EV_LOGOUT for 0x7900a0
- Driver successfully logs back into the fabric via Fibre Channel GPNFT and PLOGI processes.
  - YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_gpn_ft_tgt_list: 2384: FDLS process GPN_FT tgt list: 0x7900a0 ctrl:0x0
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_create_tport: 2214: FDLS create tport: fcid: 0x7900a0 wwpn: 0x5742b0f0000c8032
    ...
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_tgt_send_plogi: 1352: send tgt PLOGI: tgt: 0x7900a0 OXID: 0x2001
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_tgt_send_plogi: 1365: tgt plogi timeout: 20000
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_gpn_ft_rsp: 2611: iport->state: 4
    ...
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_tgt_plogi_rsp: 1686: PLOGI accepted by target: 0x7900a0
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_tgt_plogi_rsp: 1765: MFS: Max frame size: 2048 iport mfs: 2048 tport mfs: 2048
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_tgt_send_prli: 1390: FDLS sending PRLI to tgt: 0x7900a0 OXID: 0x2201
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_tgt_send_prli: 1414: tgt prli timeout: 20000
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_tgt_prli_rsp: 1824: PRLI accepted from target: 0x7900a0
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_tgt_prli_rsp: 1887: PRLI: Target found: 0x7900a0. TGT now in ready state. Adding tport.
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fnic_claim_target: 4874: Target added tport->fcid: 0x7900a0

Environment

Cause

The NFNIC driver will receive taskmgmt abort requests to recover the pending IO. If those pending commands can be aborted successfully, IO commands will continue normally. But if it fails to abort, the NFNIC driver will log out/ re-login to the fabric/ target array to recover. This will trigger temporary path loss and the reconfiguration of new paths, resulting in "Path redundancy to storage device degraded" vCenter alerts that then clear.

Resolution

This issue will be resolved in Cisco NFNIC driver 5.0.0.48, which is expected to be released in December 2025/ January 2026.

For more information, please engage Cisco.

Additional Information

Additional KB related to fixes in Cisco NFNIC driver 5.0.0.48:

Temporary/transient storage path loss on Host could result in paths not coming back when using Cisco UCS and NFNIC

General Information in regards to FC Aborts (= Fibre Channel Network Interface Card (FNIC) Abort ):

Cisco Reference
Broadcom Reference