ESXi Host Not Responding and Virtual Machines are Greyed Out or Inaccessible

search cancel

ESXi Host Not Responding and Virtual Machines are Greyed Out or Inaccessible

book

Article ID: 436903

calendar_today

Updated On:

Products

VMware

Issue/Introduction

VMs on multiple ESXi host went into a frozen/not responding state. vCenter showing syn issues with hosts

Multiple VMware vSphere ESXi hosts simultaneously experience a Permanent Device Loss (PDL) condition.

The ESXi storage stack marks Fibre Channel storage devices as permanently inaccessible after paths are dropped.

Environment

VMware vSphere ESXi 8.0.3

Cause

The following SCSI sense codes and active storage rejections are observed in the `vmkernel` logs:
`WARNING: VMW_SATP_ALUA: satp_alua_issueInquiry:101: Target reported LUN_NOT_CONNECTED / NO_DEVICE with PQ: 0x3, PDT: 0x1f, path: vmhba65:C0:T1:L2`
`WARNING: ScsiDevice: 1794: Device :xxxxxxxx has been removed or is permanently inaccessible.`
`ScsiDevice: 1808: Permanently inaccessible device :xxxxxxx has no more open connections. It is now safe to unmount datastores...`

The external Fibre Channel storage array is actively rejecting I/O from the ESXi hosts with a SCSI hardware error sense code (`PQ: 0x3, PDT: 0x1f`), explicitly stating that the Logical Unit Number (LUN) is not connected or does not exist.

Resolution

This issue is outside the scope of the vSphere hypervisor configuration, as the hypervisor is functioning as designed by halting I/O queues in response to the array's rejection codes. The root cause resides within the Fibre Channel fabric or the storage array itself.

Engage your Storage and SAN Administrators to investigate the following array-side components:

1. Verify LUN Masking/Presentation: Confirm if the LUNs were accidentally unpresented or unmapped from the ESXi host WWPNs (World Wide Port Names) on the storage array.
2. Verify SAN Zoning: Audit the Fibre Channel switch zoning. If the zones were modified, the ESXi host HBAs may have lost authorization to communicate with the array target ports.
3. Investigate Storage Array Health: Review the storage array controllers for unexpected reboots, failovers, or hardware faults that could have taken the target ports offline.

Additional Information

Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere

Feedback

thumb_up Yes

thumb_down No