Lost access to storage events as well as Storage Paths going offline/online when connected to a Pure Storage array

search cancel

Lost access to storage events as well as Storage Paths going offline/online when connected to a Pure Storage array

book

Article ID: 409994

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESX 7.x VMware vSphere ESX 8.x VMware vSphere ESXi 8.0

Issue/Introduction

A VCF Administrator observes Lost Access to Storage events in vCenter Server as well as temporary storage path losses. When reviewing /var/log/vmkernel.log on ESXi, the administrator will observe FAILOVER events:

2025-09-03T00:15:21.418Z cpu1:2097234)NMP: nmp_ThrottleLogForDevice:3867: Cmd 0x89 (0x45b990c9a148, 2097225) to dev "naa.624a937077c93e##################" on path "vmhba64:C0:T0:L212" Failed:
2025-09-03T00:15:21.418Z cpu1:2097234)NMP: nmp_ThrottleLogForDevice:3875: H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 0x8 0x0. Act:FAILOVER. cmdId.initiator=0x4308ff6cfb00 CmdSN 0x1745a9
2025-09-03T00:15:21.418Z cpu1:2097234)WARNING: NMP: nmp_DeviceRetryCommand:133: Device "naa.624a937077c93e##################": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.

Environment

ESXi (All versions)
Pure Storage array

Cause

The reason for Lost Access to Storage event as due to SCSI command 0x89 failed commands due to the Pure array returning a NOT READY status with a LOGICAL UNIT COMMUNICATION FAILURE. When ESXi receives a NOT READY status, it will perform a path failover in order to continue issuing I/O:

2025-09-03T00:15:21.418Z cpu1:2097234)NMP: nmp_ThrottleLogForDevice:3867: Cmd 0x89 (0x45b990c9a148, 2097225) to dev "naa.624a937077c93e##################" on path "vmhba64:C0:T0:L212" Failed:
2025-09-03T00:15:21.418Z cpu1:2097234)NMP: nmp_ThrottleLogForDevice:3875: H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 0x8 0x0. Act:FAILOVER. cmdId.initiator=0x4308ff6cfb00 CmdSN 0x1745a9

D:0x2 = CHECK CONDITION

Valid sense data: 0x2 0x8 0x0 translates to:

0x2 = NOT READY
0x8/0x0 = LOGICAL UNIT COMMUNICATION FAILURE

Later on, there are events where the Pure array is issue a fabric logout (LOGO) command to the initiators on the ESXi host:

2025-09-03T00:15:46.382Z cpu44:2097603)qedf:vmhba64:qedfc_expl_logo:6728:Info: ST(RPORT): EXPL_LOGO C_ID[0x1]:P_ID[0x22ad80]:T_ID[1]
2025-09-03T00:15:53.384Z cpu32:2098261)qedf:vmhba65:qedfc_expl_logo:6728:Info: ST(RPORT): EXPL_LOGO C_ID[0x1]:P_ID[0x25ad80]:T_ID[1]

This is what creates the temporary dead paths.

Resolution

This sequence is expected behavior for a PURE array when it forcibly reboots its own controller due to an internal failure to communicate with its own LUNs. It first sends the NOT READY status, then requests the HBAs to perform a fabric logout (LOGO) so that it can gracefully reboot the array controller. For more information as to the exact reason the Pure array does this, contact Pure Storage support for assistance.

Feedback

thumb_up Yes

thumb_down No