"Path redundancy to storage device degraded" alerts intermittently received in Cisco UCS environment.
search cancel

"Path redundancy to storage device degraded" alerts intermittently received in Cisco UCS environment.

book

Article ID: 404949

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • Intermittent vCenter alerts stating "Path redundancy to storage device degraded" received in ESXi/ UCS environments using an NFNIC driver. 
  • No impact on the running VMs as existing paths toggle from H:0x1/ NO CONNECT to Active almost immediately.
  • Alerts about datastore connectivity are also seen in Aria Operations.
  • New datastore creation on UCS host fails with the error: 
An error occurred during host configuration: Operation failed, diagnostics report: Unable to create Filesystem, please see VMkernel log for more details: Failed to create VMFS on device naa.##########################:#
  • From the host /var/run/log/vmkernel.log ,the following sequence of events are seen:
    • NFNIC driver attempts to abort pending commands.
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu12:2097611)nfnic: <1>: INFO: fnic_abort_cmd: 3810: Abort cmd called for Tag: 0x561  issued time: 0 ms CMD_STATE: FNIC_IOREQ_CMD_PENDING CDB Opcode: 0x8a  sc:0x45d98f31a5c0 flags: 0x3 lun: 3 target: 0x790060
YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu12:2097611)WARNING: nfnic: <1>: fnic_abort_cmd: 3825: Abort for cmd tag: 0x561 in pending state
    • One command with tag 0x561 replied with FCPIO_ITMF_REJECTED.  That means the storage array doesn't abort this command.
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu42:2097998)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2332: fcpio hdr status: FCPIO_ITMF_REJECTED <---- abort rejected for tag 0x561 -->
YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu42:2097998)WARNING: nfnic: <1>: fnic_fcpio_itmf_cmpl_handler: 2363: abort reject received id: 0x561
    • The driver reacts by trying a fabric logout.  The logout operation forces the previously attempted command abort to complete.
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu42:2097998)nfnic: <1>: INFO: fnic_handle_itmf_reject: 2218: Abort Rejected ! sending TGT_EV_LOGOUT for 0x7900a0
    • Driver successfully logs back into the fabric via Fibre Channel GPNFT and PLOGI processes.
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_gpn_ft_tgt_list: 2384: FDLS process GPN_FT tgt list: 0x7900a0 ctrl:0x0
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_create_tport: 2214: FDLS create tport: fcid: 0x7900a0 wwpn: 0x5742b0f0000c8032
...
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_tgt_send_plogi: 1352: send tgt PLOGI: tgt: 0x7900a0 OXID: 0x2001
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_tgt_send_plogi: 1365: tgt plogi timeout: 20000
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_gpn_ft_rsp: 2611: iport->state: 4
...
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_tgt_plogi_rsp: 1686: PLOGI accepted by target: 0x7900a0
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_tgt_plogi_rsp: 1765: MFS: Max frame size: 2048 iport mfs: 2048 tport mfs: 2048
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_tgt_send_prli: 1390: FDLS sending PRLI to tgt: 0x7900a0 OXID: 0x2201
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_tgt_send_prli: 1414: tgt prli timeout: 20000
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_tgt_prli_rsp: 1824: PRLI accepted from target: 0x7900a0
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fdls_process_tgt_prli_rsp: 1887: PRLI: Target found: 0x7900a0. TGT now in ready state. Adding tport.
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu5:2097949)nfnic: <1>: INFO: fnic_claim_target: 4874: Target added tport->fcid: 0x7900a0

Environment

  • Products: VMware vSphere ESXi 7.x, VMware vSphere ESXi 8.x
  • Hardware: Cisco UCS B-Series or C-Series Servers
  • Storage: Fibre Channel (FC) connectivity using Cisco nfnic driver versions prior to 5.0.0.48.

Cause

The issue is caused by the nfnic driver's error-handling logic. When the driver attempts to abort a pending I/O command and receives an FCPIO_ITMF_REJECTED status from the storage array (often because the array never received the command due to fabric congestion or dropped frames), the driver triggers an automated fabric logout and re-login sequence. This recovery process causes a transient loss of connectivity, triggering the path redundancy alarms.

Note: Intermittent "Path redundancy to storage device degraded" alerts occur in ESXi and Cisco UCS environments using the nfnic driver. These alerts typically resolve automatically as paths toggle from a disconnected state to active. However, in some scenarios, this driver behavior can lead to VMFS datastore creation failures or host PSODs (Purple Screen of Death).

Resolution

1. Driver Upgrade

  • Target Version: Fixed in nfnic driver version 5.0.0.48 and higher.
  • Action: Upgrade the nfnic driver to version 5.0.0.48. Version 5.0.0.46 also provides improved stability for virtual reset operations.
  • Download: Fixed in release 5.0.0.48 and higher. See Download Broadcom products and software for steps to download this release.

2. SAN Fabric Health Check

  • Verify the physical integrity of the fabric. The FCPIO_ITMF_REJECTED status is often triggered by dropped frames or fabric congestion.
  • Inspect Fabric Interconnects and SAN switches for:
    • Cyclic Redundancy Check (CRC) errors.
    • Faulty SFP modules or low light levels.
    • Port flapping or high txwait counters.

3. Engagement with Cisco

  • Contact Cisco Support to confirm the recommended firmware and driver baseline for your specific UCS hardware model to avoid driver deadlocks.

Additional Information

General Information in regards to FC Aborts (= Fibre Channel Network Interface Card (FNIC) Abort ):