2025-06-30T10:24:15.669Z In(182) vmkernel: cpu70:2098411)lpfc: lpfc_els_rcv_rscn:7907: vmhba4 0214 RSCN received Data: x800220 x0 x4 x12025-06-30T10:24:15.669Z In(182) vmkernel: cpu70:2098411)lpfc: lpfc_els_rcv_rscn:7914: vmhba4 5973 RSCN received event x0 : Address format x00 : DID ########
After couple of minutes, "Power on reset" on multiple devices.
2025-06-30T10:26:51.529Z In(182) vmkernel: cpu42:2098463)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0xa3 (0x##########, 0) to dev "naa.###############" on path "vmhba4:C0:T37:L7" Failed:2025-06-30T10:26:51.529Z In(182) vmkernel: cpu42:2098463)NMP: nmp_ThrottleLogForDevice:3898: H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0. Act:NONE. cmdId.initiator=0x########### CmdSN 0x0
This is followed up with more aborts seen.
2025-06-30T10:27:47.781Z In(182) vmkernel: cpu87:2106775)lpfc: lpfc_handle_status:5631: vmhba4 3271: FCP cmd x2a failed <38/3> sid x01e700, did ##########, oxid x2338 iotag xe5e Abort Requested Host Abort Req2025-06-30T10:27:47.781Z In(182) vmkernel: cpu87:2098464)NMP: nmp_ThrottleLogForDevice:3842: last error status from device naa.############# repeated 2 times2025-06-30T10:27:47.781Z In(182) vmkernel: cpu87:2098464)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x2a (0x45da2a62d780, 2097272) to dev "naa.###################" on path "vmhba4:C0:T38:L3" Failed:2025-06-30T10:27:47.781Z In(182) vmkernel: cpu87:2098464)NMP: nmp_ThrottleLogForDevice:3898: H:0x5 D:0x0 P:0x0 . Act:EVAL. cmdId.initiator=############ CmdSN ########
Post that lpfc_sli_abts_recover_port and devloss messages are noticed. lpfc_sli_abts_recover_port is called if there is no response for ABTS-LS (Aborts). This is followed with the scsi error (H:0x2 D:0x8 P:0x0 and H:0x1 D:0x0 P:0x0).
H:0x2 D:0x8 P:0x0 -> Device status 0x8 is returned when a LUN cannot accept SCSI commands at the momentH:0x1 D:0x0 P:0x0 -> H:0x1 is NO_CONNECT, This status is returned if the connection is lost to the LUN.
2025-06-30T10:28:29.258Z Wa(180) vmkwarning: cpu59:2098411)WARNING: lpfc: lpfc_sli_abts_recover_port:11869: vmhba4 3094 Start rport recovery on sadapter id 0x3 fc_id ############ vpi 0x0 rpi 0x29 xri 0x2338 state 0x7 flags 0x800000002025-06-30T10:28:29.258Z Wa(180) vmkwarning: cpu59:2098411)WARNING: lpfc: lpfc_start_devloss:4565: vmhba4 3248 Start 10 sec devloss tmo WWPN 20:##:##:##:##:##:##:ca NPort ########2025-06-30T10:28:29.258Z In(182) vmkernel: cpu42:5459514)lpfc: lpfc_handle_status:5631: vmhba4 3271: FCP cmd x9e failed <38/2> sid x01e700, did ########, oxid ######## iotag xbe9 Time Out Returning Host Busy
When checking more, we notice PLOGI failures on the ports where the toggle happened.
2025-06-30T10:30:35.555Z In(182) vmkernel: cpu71:2098411)lpfc: lpfc_els_retry:4864: vmhba4 0108 No retry ELS command x3 to remote NPORT ####### Retried:3 Error:x3/x22025-06-30T10:30:35.555Z Wa(180) vmkwarning: cpu71:2098411)WARNING: lpfc: lpfc_cmpl_els_plogi:2172: vmhba4 2753 PLOGI failure DID:####### Status:x3/x2 State: x1 Ref: 10 Flags: x40008
2025-06-30T10:36:11.345Z In(182) vmkernel: cpu64:2098411)lpfc: lpfc_els_retry:4864: vmhba4 0108 No retry ELS command x3 to remote NPORT ####### Retried:3 Error:x3/x22025-06-30T10:36:11.345Z Wa(180) vmkwarning: cpu64:2098411)WARNING: lpfc: lpfc_cmpl_els_plogi:2172: vmhba4 2753 PLOGI failure DID:####### Status:x3/x2 State: x1 Ref: 10 Flags: x40008
2025-06-30T10:39:26.804Z In(182) vmkernel: cpu74:2098411)lpfc: lpfc_els_retry:4864: vmhba4 0108 No retry ELS command x3 to remote NPORT ####### Retried:3 Error:x3/x22025-06-30T10:39:26.804Z Wa(180) vmkwarning: cpu74:2098411)WARNING: lpfc: lpfc_cmpl_els_plogi:2172: vmhba4 2753 PLOGI failure DID:####### Status:x3/x2 State: x1 Ref: 10 Flags: x40008
Above messages means that there is failure to establish a connection between a host and a target port in a Storage Area Network (SAN) environment. Log entries indicate a ‘Status:x3/x2,’ suggesting that the Host Bus Adapter (HBA) firmware did not receive a response for a Port Login (PLOGI) request. The PLOGI login request initiated by the host successfully reached the SAN, indicating that the connection from the host’s side was established. However, the response acknowledging the acceptance (ACC) of the PLOGI login request failed to reach the affected hosts, preventing the HBA from proceeding with the expected PRLI (Process Login) operation necessary for higher-level SCSI communication.
After waiting for 20 seconds without receiving a response, the HBA firmware rejected the PLOGI request due to a lack of timely response, indicating a communication problem between the host and target-side HBA.
VMware vSphere ESXi 8.x
In a Fibre Channel (FC) fabric switch, a single port issue can indeed cause widespread problems that affect other ports, and potentially the entire fabric. This is due to the interconnected nature and specific flow control mechanisms of Fibre Channel. An offending port on the fabric switch which show issues (CRC, Rx/Tx power) may cause issues to the other ports connected to single switch port (e.g., a server HBA, or a storage array port).
There are no issues with the ESXi HBA. Check the SAN switch ports for possible issues and disable any offending port on the switch, as it may show errors when running the 'port show' or similar commands to retrieve the port statistics.