VMware ESX/ESXi hosts in All-Paths-Down (APD) condition may appear as Not Responding in VMware vCenter Server
search cancel

VMware ESX/ESXi hosts in All-Paths-Down (APD) condition may appear as Not Responding in VMware vCenter Server

book

Article ID: 318938

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

Symptoms:
An ESXi/ESX host with one or more LUNs in an All-Paths-Down (APD) condition may become unmanageable in vCenter Server and you may experience these symptoms:
  • The ESXi/ESX host appears as Disconnected or Not Responding in the vCenter Server inventory.
  • Virtual machines using the LUNs in APD may become unresponsive.
  • Connecting to the ESXi/ESX host using the vSphere Client, vCLI, or PowerCLI fails.
  • Adding a host to vCenter Server fails with this error:

    Failed to read resource pool tree from host
     
  • Connecting to the ESXi/ESX host using SSH is successful.
  • The vmware-hostd management service is not running.
  • Connecting to the vmware-hostd management service using the vim-cmd or vmware-cmd command fails.
  • The last line in the /var/log/vmware/hostd.log file contains an entry similar to:

    verbose 'FSVolumeProvider'] RefreshVMFSVolumes called
     
  • This issue occurs when you are running ESX/ESXi.


Environment

VMware vSphere ESXi 6.5
VMware vSphere ESXi 5.1
VMware vSphere ESXi 5.5
VMware vSphere ESXi 6.0
VMware vCenter Server Appliance 6.0.x

Resolution

Scope

This article applies only to hosts running versions of ESX/ESXi hosts which have advanced setting Misc.APDHandlingEnable set to a value of 0.

Note: The default setting is set to a value of 1.

Default APD handling is different in ESXi 5.1 and later including ESXi 6.x.

For more information, see the Handling Transient APD Conditions section in the vSphere Storage Guide.

Validation

Determine whether there are any LUNs in an All-Paths-Down (APD) state on an ESXi/ESX host:
  1. Open a console to the ESXi/ESX host. For more information, see Unable to connect to an ESX host using Secure Shell (SSH) (1003807) or Using Tech Support Mode in ESXi 4.1 and ESXi 5.0 (1017910).
  2. Use the esxcfg-mpath command to obtain a list of all device paths, and filter by their State:

    # esxcfg-mpath --list-paths --device <device mpx/naa name> | grep state

    If you do not know the problem device ID or you have many devices it may be more efficient to use this command to identify the dead paths:

    # esxcfg-mpath -b |grep -C 1 dead
     
  3. If any path reports the State as dead, but other paths to the same device report the State as Up, perform a rescan to remove the stale device entries. For more information, see Performing a rescan of the storage on an ESXi/ESX host (1003988).
  4. If every path to a LUN reports the State as dead, then the LUN is in an All-Paths-Down state.

Preemptive workaround

If the APD condition is noticed prior to any process opening a file on the affected VMFS datastores, the impending blocking I/O can be fast-failed by setting the advanced configuration option VMFS3.FailVolumeOpenIfAPD = 1 on ESXi/ESX 4.1. For more information, see Configuring advanced options for ESX/ESXi (1038578).

In situations where any dead path or APD is noticed, individual HBAs can be rescanned using this command:

# esxcfg-rescan -d vmhbaX

Note: Replace vmhbaX with the appropriate HBA, for example vmhba33.

In ESX/ESXi 4.1 and later, all HBAs can be rescanned using this command:

# esxcfg-rescan -A

Note: If any device is already in an APD condition with active I/O already waiting for the device to return, setting this option does not cause the already-issued I/O to fail. It is necessary to either bring the LUN paths back up, or to wait for the I/O to eventually fail.

For more information, see Virtual machines stop responding when any LUN on the host is in an all-paths-down (APD) condition (1016626).

To avoid the APD state on an ESXi/ESX host, ensure to use the correct method to unpresent the LUNs. For more information on the correct procedure for unpresenting LUNs, see Removing a LUN containing a datastore from VMware ESXi/ESX 4.x (1029786) or Unmounting a LUN or Detaching a Datastore/Storage Device from multiple ESXi 5.x hosts (2004605), depending on your ESX/ESXi version.

Note: When changing Fabric switching, confirm that the settings are correct. This issue is seen to occur when switching brands of switches. Contact the switch vendor for the appropriate configuration when performing a switch migration.

Additional Information

To be alerted when this document is updated, click the Subscribe to Article link in the Actions box

Unable to connect to an ESX host using Secure Shell (SSH)
Performing a rescan of the storage on an ESX/ESXi host
Virtual machines stop responding when any LUN on the host is in an all-paths-down (APD) condition
Using Tech Support Mode in ESXi 4.1, ESXi 5.x, and ESXi 6.x
Removing a LUN containing a datastore from VMware ESXi/ESX 4.0 and 4.1
Configuring advanced options for ESXi/ESX
How to unmount a LUN or detach a datastore device from ESXi hosts
Permanent Device Loss (PDL) and All-Paths-Down (APD) in vSphere 5.x and 6.x
VMware ESX/ESXi 4.x/5.0 ホストが全パス ダウン (APD) 状態になると、VMware vCenter Server に [応答なし] として表示される場合がある
处于全部路径异常 (APD) 状况的 VMware ESX/ESXi 4.x/5.x 和 6.x 主机可能在 VMware vCenter Server 中显示为“无响应”