VMware vSphere ESXi 6.x
VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x
An All-Paths-Down (APD) situation occurs when all paths to a device are down. As there is no indication whether this is a permanent or temporary device loss, the ESXi host keeps reattempting to establish connectivity. APD style situations commonly occur when the LUN is incorrectly unpresented from the ESXi host. The ESXi host, still believing the device is available, retries all SCSI commands indefinitely. This has an impact on the management agents, as their commands are not responded to until the device is again accessible. This causes the ESXi host to become inaccessible or not-responding in the vCenter Server.
If PDL SCSI sense codes are not returned from a device (when unable to contact the storage array, or with a storage array that does not return the supported PDL SCSI codes), then the device is in an All-Paths-Down (APD) state, and the ESXi host continues to send I/O requests until the host receives a response.
As the ESXi host is not able to determine if the device loss is permanent (PDL) or transient (APD), it indefinitely retries SCSI I/O, including:
Notes:
A clear distinction has been made between a device that is permanently lost (PDL) and a transient issue where all paths are down (APD) for an unknown reason.
For example, in the vmkernel logs, if a SCSI sense code of H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0 or Logical Unit Not Supported is logged by the storage device to the ESXi host, this indicates that the device is permanently inaccessible to the ESXi host or is in a Permanent Device Loss (PDL) state. The ESXi host no longer attempts to re-establish connectivity or issue commands to the device.
Devices that suffer a non-recoverable hardware error are also recognized as being in a Permanent Device Loss (PDL) state.
This table outlines possible SCSI sense codes that determine if a device is in a PDL state:
SCSI Sense Code | Description |
---|---|
H:0x0 D:0x2 P:0x0 Valid sense data: 0x__ /0x25/0x0 | *LOGICAL UNIT NOT SUPPORTED |
H:0x0 D:0x2 P:0x0 Valid sense data: 0x__/0x68/0x0 | *LOGICAL UNIT NOT CONFIGURED |
H:0x0 D:0x2 P:0x0 Valid sense data: 0x4/0x4c/0x0 | HARDWARE ERROR/LOGICAL UNIT FAILED SELF-CONFIGURATION |
H:0x0 D:0x2 P:0x0 Valid sense data: 0x4/0x3e/0x3 | HARDWARE ERROR/LOGICAL UNIT FAILED SELF-TEST |
H:0x0 D:0x2 P:0x0 Valid sense data: 0x4/0x3e/0x1 | HARDWARE ERROR/LOGICAL UNIT FAILURE |
H:0x0 D:0x2 P:0x0 Valid sense data: 0x2/0x4c/0x0 | NOT READY/LOGICAL UNIT FAILED SELF-CONFIGURATION |
H:0x0 D:0x2 P:0x0 Valid sense data: 0x2/0x3e/0x3 | NOT READY/LOGICAL UNIT FAILED SELF-TEST |
H:0x0 D:0x2 P:0x0 Valid sense data: 0x2/0x3e/0x1 | NOT READY/LOGICAL UNIT FAILURE |
*ESXi only checks ASC/ASCQ and if it happens to be 0x25/0x0 or 0x68/0x0, it marks device as PDL.
Note: Some iSCSI arrays map LUN-to-Target as a one-to-one relationship. That is, there is only ever a single LUN per Target. In this case, the iSCSI arrays do not return the appropriate SCSI sense code, so a PDL on these array types cannot be detected.
A planned PDL occurs when there is an intent to remove a device presented to the ESXi host. The datastore must first be unmounted, then the device detached before the storage device can be unpresented at the storage array. For more information, see How to detach a LUN device from ESXi hosts .
An unplanned PDL occurs when the storage device is unexpectedly unpresented from the storage array without the unmount and detach being executed on the ESXi host.
VMware provides a feature called Auto-remove for automatic removal of devices during an unplanned PDL. For more information, please check Disabling PDL AutoRemove feature vSphere ESXI.
To clean up an unplanned PDL: