"Path redundancy to storage device degraded" alerts intermittently received in Cisco UCS environment.
search cancel

"Path redundancy to storage device degraded" alerts intermittently received in Cisco UCS environment.

book

Article ID: 404949

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • "Path redundancy to storage device degraded" alerts are intermittently received in ESXi/ UCS environments using an NFNIC driver.  No impact on the running VMs as existing paths toggle from H:0x1/ NO CONNECT to Active almost immediately.
  • New datastore creation on UCS host fails with the error: An error occurred during host configuration: Operation failed, diagnostics report: Unable to create Filesystem, please see VMkernel log for more details: Failed to create VMFS on device naa.##########################:#
  • Multiple instances of fnic_abort_cmd noted in the /var/run/log/vmkernel.log in the following pattern: 
    • NFNIC driver attempts to abort pending commands.
      • YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu12:2097611)nfnic: <1>: INFO: fnic_abort_cmd: 3810: Abort cmd called for Tag: 0x561  issued time: 0 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x8a  sc:0x45d98f31a5c0 flags: 0x3 lun: 3 target: 0x790060
        YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu12:2097611)WARNING: nfnic: <1>: fnic_abort_cmd: 3825: Abort for cmd tag: 0x561 in pending state
    • One command with tag 0x561 replied with FCPIO_ITMF_REJECTED.  That means the storage array doesn't abort this command.
      • YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu42:2097998)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2332: fcpio hdr status: FCPIO_ITMF_REJECTED <---- abort rejected for tag 0x561
        YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu42:2097998)WARNING: nfnic: <1>: fnic_fcpio_itmf_cmpl_handler: 2363: abort reject received id: 0x561
    • The driver reacts by trying a fabric logout.  The logout operation forces the previously attempted command abort to complete.
      • YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu42:2097998)nfnic: <1>: INFO: fnic_handle_itmf_reject: 2218: Abort Rejected ! sending TGT_EV_LOGOUT for 0x7900a0
    • Driver successfully logs back into the fabric via Fibre Channel GPNFT and PLOGI processes.
      • YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_gpn_ft_tgt_list: 2384: FDLS process GPN_FT tgt list: 0x7900a0 ctrl:0x0
        YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_create_tport: 2214: FDLS create tport: fcid: 0x7900a0 wwpn: 0x5742b0f0000c8032
        ...
        YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_tgt_send_plogi: 1352: send tgt PLOGI: tgt: 0x7900a0 OXID: 0x2001
        YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_tgt_send_plogi: 1365: tgt plogi timeout: 20000
        YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_gpn_ft_rsp: 2611: iport->state: 4
        ...
        YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_tgt_plogi_rsp: 1686: PLOGI accepted by target: 0x7900a0
        YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_tgt_plogi_rsp: 1765: MFS: Max frame size: 2048 iport mfs: 2048 tport mfs: 2048
        YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_tgt_send_prli: 1390: FDLS sending PRLI to tgt: 0x7900a0 OXID: 0x2201
        YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_tgt_send_prli: 1414: tgt prli timeout: 20000
        YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_tgt_prli_rsp: 1824: PRLI accepted from target: 0x7900a0
        YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_tgt_prli_rsp: 1887: PRLI: Target found: 0x7900a0. TGT now in ready state. Adding tport.
        YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fnic_claim_target: 4874: Target added tport->fcid: 0x7900a0

Environment

 

 

Cause

The NFNIC driver will receive taskmgmt abort requests to recover the pending IO.  If those pending commands can be aborted successfully, IO commands will continue normally.  But if it fails to abort, the NFNIC driver will log out/ re-login to the fabric/ target array to recover.  This will trigger temporary path loss and the reconfiguration of new paths, resulting in "Path redundancy to storage device degraded" vCenter alerts that then clear.

Resolution

This issue will be resolved in Cisco NFNIC driver 5.0.0.48, which is expected to be released in December 2025/ January 2026.

For more information, please engage Cisco. 

Additional Information