Multiple ESX servers with FC HBAs (nfnic) briefly report path redundancy degraded at the same time to a single storage array
search cancel

Multiple ESX servers with FC HBAs (nfnic) briefly report path redundancy degraded at the same time to a single storage array

book

Article ID: 398911

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESX 7.x

Issue/Introduction

Multiple hosts with fiber channel (nfnic) HBAs trigger path redundancy warnings for LUNs residing on a single storage array.

In the vSphere client, path redundancy warnings are triggered e.g.:

Path redundancy to storage device naa.################################ degraded. Path vmhba#:C#:T##:L### is down. Affected datastores: <Datastore_Name>.

Paths recover shortly after the condition is triggered without intervention.

 

Environment

VMware vSphere ESXi 7.0 

Cause

The fiber channel nfnic driver receives an "Abort Rejected" status from the storage, which triggers fabric logo leading to path redundancy warning becoming triggered in the vSphere Client.

  • In the /var/log/vmkernel.log, fnic driver attempts to abort IO at the same time on each host. 
  • The fnic driver receives a status "abort reject" response from the storage array.
  • Its expected that upon receipt of a "Abort Rejected !" the NFNIC driver will Logout of the fabric, by logging out of that target and back in.
  • Therefore triggering a path loss alerts after which then the fnic HBA attempts to log back into the fabric and re-establishes paths.

####-##-##T##:##:##.###Z cpu##:xxxxxxx)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2329: fcpio hdr status: FCPIO_ITMF_REJECTED
####-##-##T##:##:##.###Z cpu##:xxxxxxx)WARNING: nfnic: <1>: fnic_fcpio_itmf_cmpl_handler: 2361: abort reject received id: 0x44c
####-##-##T##:##:##.###Z cpu##:xxxxxxx)nfnic: <1>: INFO: fnic_handle_itmf_reject: 2216: Abort Rejected ! sending TGT_EV_LOGOUT for 0x54a940
####-##-##T##:##:##.###Z cpu##:xxxxxxx)ScsiDeviceIO: 4115: Cmd(0x45ba85af3348) 0x8a, CmdSN 0x3e9 from world 15086465 to dev "naa.################################" failed H:0x5 D:0x0 P:0x0
####-##-##T##:##:##.###Z cpu##:xxxxxxx)nfnic: <1>: INFO: fnic_fcpio_icmnd_cmpl_handler: 1810: io_req: 0x45ba54427640 sc: 0x45ba85b8b648 tag: 0x44d CMD_FLAGS: 0x53 CMD_STATE: FNIC_IOREQ_ABTS_PENDING ABTS pending hdr status: FCPIO_ABORTED scsi_status: 0x0$
####-##-##T##:##:##.###Z cpu##:xxxxxxx)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2329: fcpio hdr status: FCPIO_SUCCESS
####-##-##T##:##:##.###Z cpu##:xxxxxxx)nfnic: <1>: INFO: fnic_fcpio_itmf_cmpl_handler: 2406: io_req: 0x45ba54427640 sc: 0x45ba85b8b648 id: 0x44d CMD_FLAGS: 0x73 CMD_STATE: FNIC_IOREQ_ABTS_PENDINGhdr status: FCPIO_SUCCESS ABTS cmpl received
####-##-##T##:##:##.###Z cpu##:xxxxxxx)nfnic: <1>: INFO: fnic_tport_event_handler: 2025: logging out from tport: 85 tport->fcid: 0x54a940
####-##-##T##:##:##.###Z cpu##:xxxxxxx)nfnic: <1>: INFO: fdls_tgt_logout: 1515: Sending logo to tid: 0x54a940

  • Start to see "tport is Null" and "No connection" to the target as the fnic has triggered a logo

####-##-##T##:##:##.###Z cpu##:xxxxxxx)nfnic: <1>: INFO: fnic_queuecommand: 731: returning IO as lun is inactive or tport is NULL. driverIO:0
####-##-##T##:##:##.###Z cpu##:xxxxxxx)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:160: Could not get page 83 INQUIRY data for path "vmhba#:C0:T85:L###" - No connection (195887168)

  • Then the paths are removed, hence triggering alerts in the vSphere client.

####-##-##T##:##:##.###Z cpu##:xxxxxxx)ScsiPath: 9079: DeletePath : adapter=vmhba1, channel=0, target=85, lun=###
####-##-##T##:##:##.###Z cpu##:xxxxxxx)WARNING: ScsiPath: 9158: Remove path: vmhba1:C0:T85:L###
.
.
####-##-##T##:##:##.###Z cpu##:xxxxxxx)nfnic: <1>: INFO: fnic_delete_destroyed_paths: 151: Releasing resource LUN:### target:85

 

Resolution

When the fnic driver receives an " Abort Rejected" status from the storage, this triggers a recovery process where the fnic will logout of the fabric (logo) and then log back in.

In the process this the paths are removed to the target and reestablished again once the host logs back in, in this case the behavior is expected. 

To investigate further contact the storage vendor to as to why the fnic received a " Abort Rejected" status.