Symptoms:
An ESXi host with one or more LUNs in an All-Paths-Down (APD) condition may become unmanageable in vCenter Server and may experience these symptoms:
Disconnected
or Not Responding
in the vCenter Server inventory.verbose 'FSVolumeProvider'] RefreshVMFSVolumes called
This article applies only to hosts running versions of ESXi hosts which have advanced setting Misc.APDHandlingEnable set to a value of 0.
Note: The default setting is set to a value of 1.
Default APD handling is different in ESXi 6.x
For more information, see the Handling Transient APD Conditions section in the vSphere Storage Guide
.
Determine whether there are any LUNs in an All-Paths-Down (APD) state on an ESXi host:
esxcfg-mpath
command to obtain a list of all device paths, and filter by their State:# esxcfg-mpath --list-paths --device <device mpx/naa name> | grep state
esxcfg-mpath -b |grep -C 1 dead
dead
, but other paths to the same device report the State as Up
, perform a rescan to remove the stale device entries. For more information, see Performing a rescan of the storage on an ESXi/ESX host (1003988).If the APD condition is noticed prior to any process opening a file on the affected VMFS datastores, the impending blocking I/O can be fast-failed by setting the advanced host configuration option VMFS3.FailVolumeOpenIfAPD = 1
. For more information, see Configuring advanced options for ESXi/ESX .
In situations where any dead path or APD is noticed, individual HBAs can be rescanned using this command:# esxcfg-rescan -d vmhbaX
Note: Replace vmhbaX
with the appropriate HBA, for example vmhba33
.
To rescan all the HBA, run the CMD below:
# esxcfg-rescan -A
Note: If any device is already in an APD condition with active I/O already waiting for the device to return, setting this option does not cause the already-issued I/O to fail. It is necessary to either bring the LUN paths back up, or to wait for the I/O to eventually fail.
To avoid the APD state on an ESXi host, ensure to use the correct method to unpresent the LUNs. For more information on the correct procedure for unpresenting LUNs, see How to detach a LUN device from ESXi hosts, depending on the ESXi version.
Note: When changing Fabric switching, confirm that the settings are correct. This issue is seen to occur when switching brands of switches. Contact the switch vendor for the appropriate configuration when performing a switch migration.