ESXi hosts appear offline in the SAN switch following a SAN upgrade
book
Article ID: 413021
calendar_today
Updated On:
Products
VMware vSphere ESX 8.x
Issue/Introduction
Following SAN storage upgrade host status show as offline in SAN switch
The HBAs responsible for SAN connectivity on the ESXi hosts appear online in vCenter, yet all datastores connected via these HBAs are no longer visible or accessible from the ESXi hosts.
Environment
VMware vSphere ESXi 8.x
Cause
This behavior typically results from the SAN switch fabric ports becoming stuck, unresponsive, or experiencing improper negotiation with the ESXi host's HBAs.
Host vmkernel logs show a pattern of FC command timeouts and aborts, indicating communication failures: YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu104:5180284)lpfc: lpfc_handle_status:4260: vmhba4 3271: FCP cmd x12 failed <8/768> sid x130100, did x######, oxid x102 iotag x428 Abort Requested Host Abort Req YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu104:5180284)WARNING: lpfc : vmhba4 lpfc_abort_fcp_cmpl:7400: 3096 Abort completion for abort cmd iotag x295 xri:0x102req_tag x295, status x0, hwstatus x0 YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu23:2098055)NMP: nmp_ThrottleLogForDevice:3842: last error status from device naa.############### repeated 1 times YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu23:2098055)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x12 (0x45bac33a8e80, 0) to dev "naa.###############" on path "vmhba4:C#:T#:L#" Failed: YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu23:2098055)NMP: nmp_ThrottleLogForDevice:3898: H:0x5 D:0x0 P:0x0 . Act:NONE. cmdId.initiator=0x453a80b1bb58 CmdSN 0x0 sllid: ffffffffffffffff YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu36:3639466)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:1005: Path "vmhba4:C0:T8:L768" (UP) command 0x12 failed with status Timeout. H:0x5 D:0x0 P:0x0 .
No connection to storage paths are returned for INQUIRY data YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu127:2097706)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:190: Could not get page 83 INQUIRY data for path "vmhba4:C#:T#:L#" - No connection (195887168) YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu127:2097705)WARNING: VMW_SATP_ALUA: satp_alua_getTargetPortInfo:190: Could not get page 83 INQUIRY data for path "vmhba4:C#:T#:L#" - No connection (195887168)
FC stats indicates vmhba experienced Link Failure events and Loss of Signal events. Link failure count indicates the number of times the physical Fibre channel link has completely gone down and subsequently come back up (complete disconnections from the Fibre Channel fabric). Loss of signal count indicates the number of times the HBA has detected a loss of optical signal from the Fibre Channel fabric (intermittent physical layer issues). Adapter: vmhba3 Tx Frames: 3502575927 Rx Frames: 2665158663 Lip Count: 0 Error Frames: 0 Dumped Frames: 0 Link Failure Count: 3 Loss of Signal Count: 9 PrimSeq Protocol Err Count: 0 Invalid Tx Word Count: 184 Invalid CRC Count: 0 Input Requests: 0 Output Requests: 0 Control Requests: 0
Adapter: vmhba4 Tx Frames: 280367925 Rx Frames: 4129775430 Lip Count: 0 Error Frames: 0 Dumped Frames: 0 Link Failure Count: 3 Loss of Signal Count: 13 PrimSeq Protocol Err Count: 0 Invalid Tx Word Count: 212 Invalid CRC Count: 0 Input Requests: 0 Output Requests: 0 Control Requests: 0
Resolution
There are no issues with the ESXi or HBA. It is recommended to investigate the SAN switch ports connected to the affected ESXi hosts.
Disable and then re-enable the server-facing ports on the SAN switches to force a re-initialization and re-establish proper FC connections