ESXi hosts appear offline in the SAN switch following a SAN upgrade
search cancel

ESXi hosts appear offline in the SAN switch following a SAN upgrade

book

Article ID: 413021

calendar_today

Updated On:

Products

VMware vSphere ESX 8.x

Issue/Introduction

  • Following SAN storage upgrade host status show as offline in SAN switch

  • The HBAs responsible for SAN connectivity on the ESXi hosts appear online in vCenter, yet all datastores connected via these HBAs are no longer visible or accessible from the ESXi hosts.

Environment

VMware vSphere ESXi 8.x

Cause

  • This behavior typically results from the SAN switch fabric ports becoming stuck, unresponsive, or experiencing improper negotiation with the ESXi host's HBAs.

  • Host vmkernel logs show a pattern of  FC command timeouts and aborts, indicating communication failures:
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu104:5180284)lpfc: lpfc_handle_status:4260: vmhba4 3271: FCP cmd x12 failed <8/768> sid x130100, did x######, oxid x102 iotag x428 Abort Requested Host Abort Req
    YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu104:5180284)WARNING: lpfc : vmhba4 lpfc_abort_fcp_cmpl:7400: 3096 Abort  completion for abort cmd iotag x295 xri:0x102req_tag x295, status x0, hwstatus x0
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu23:2098055)NMP: nmp_ThrottleLogForDevice:3842: last error status from device naa.############### repeated 1 times
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu23:2098055)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x12 (0x45bac33a8e80, 0) to dev "naa.###############" on path "vmhba4:C#:T#:L#" Failed:
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu23:2098055)NMP: nmp_ThrottleLogForDevice:3898: H:0x5 D:0x0 P:0x0 . Act:NONE. cmdId.initiator=0x453a80b1bb58 CmdSN 0x0 sllid: ffffffffffffffff
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu36:3639466)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:1005: Path "vmhba4:C0:T8:L768" (UP) command 0x12 failed with status Timeout. H:0x5 D:0x0 P:0x0 .

  • Furthermore logs from the issue period frequently display: PLOGI failure, LOGO failure, devloss timeout messages
    YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu45:2098630)WARNING: lpfc : vmhba4 lpfc_cmpl_els_plogi:1794: 2753 PLOGI failure DID:###### Status:x3/x2 State: x1 Ref: 32 Flags: x40008
    YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu2:2098630)WARNING: lpfc : vmhba4 lpfc_cmpl_els_plogi:1794: 2753 PLOGI failure DID:###### Status:x3/x2 State: x1 Ref: 32 Flags: x40008
    YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu0:2098630)WARNING: lpfc : vmhba4 lpfc_cmpl_els_logo:2681: 2756 LOGO failure DID:###### Status:x3/x31420002
    YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu0:2098630)WARNING: lpfc : vmhba4 lpfc_sp_handle:2369: 0321 Rsp Ring 1 error: Job Data: x021a0300 x00000000 x31420002 x10010000
    YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu0:2098630)WARNING: lpfc : vmhba4 lpfc_dev_loss_tmo_handler:505: 0203 Devloss timeout on WWPN ##:##:##:##:##:##:##:## NPort ####### Data: x108 x5 x9 xa
    YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu0:2098630)WARNING: lpfc : vmhba4 lpfc_dev_loss_tmo_handler:549: 3298 ScsiNotifyPathStateChangeAsyncSAdapter Num x6 TID x0, DID ######.

  • No connection to storage paths are returned for INQUIRY data
    YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu127:2097706)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:190: Could not get page 83 INQUIRY data for path "vmhba4:C#:T#:L#" - No connection (195887168)
    YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu127:2097705)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:190: Could not get page 83 INQUIRY data for path "vmhba4:C#:T#:L#" - No connection (195887168)

  • FC stats indicates vmhba experienced Link Failure events and Loss of Signal events. Link failure count indicates the number of times the physical Fibre channel link has completely gone down and subsequently come back up (complete disconnections from the Fibre Channel fabric). Loss of signal count indicates the number of times the HBA has detected a loss of optical signal from the Fibre Channel fabric (intermittent physical layer issues).
       Adapter: vmhba3
       Tx Frames: 3502575927
       Rx Frames: 2665158663
       Lip Count: 0
       Error Frames: 0
       Dumped Frames: 0
       Link Failure Count: 3
       Loss of Signal Count: 9
       PrimSeq Protocol Err Count: 0
       Invalid Tx Word Count: 184
       Invalid CRC Count: 0
       Input Requests: 0
       Output Requests: 0
       Control Requests: 0

       Adapter: vmhba4
       Tx Frames: 280367925
       Rx Frames: 4129775430
       Lip Count: 0
       Error Frames: 0
       Dumped Frames: 0
       Link Failure Count: 3
       Loss of Signal Count: 13
       PrimSeq Protocol Err Count: 0
       Invalid Tx Word Count: 212
       Invalid CRC Count: 0
       Input Requests: 0
       Output Requests: 0
       Control Requests: 0

Resolution

  • There are no issues with the ESXi or HBA. It is recommended to investigate the SAN switch ports connected to the affected ESXi hosts.

  • Disable and then re-enable the server-facing ports on the SAN switches to force a re-initialization and re-establish proper FC connections