Best practice for vSphere High Availability (HA) specific to APD and PDL storage events
search cancel

Best practice for vSphere High Availability (HA) specific to APD and PDL storage events

book

Article ID: 406100

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

HA settings specific to APD (All Paths Down) and PDL (Permanent Device Loss) were introduced in vSphere 7.0, referred to as VMCP (VM Component Protection), which may cause some confusion over some host level settings that have similar functionality.   VMware does not recommend making any changes to these settings unless directed by support.  Below are some examples of such host level settings.

Misc.APDHandlingEnable
Enabled by default, when a device enters an APD state the host continues to retry I/O commands only for the time period, specified by Misc.APDTimeout, before entering the device into an APD state

Misc.APDTimeout
Set to 140 seconds by default

Disk.AutoremoveOnPDL
Enabled by default, when a storage device enters a PDL state it will no longer return SCSI sense codes and ESXi will remove the device from the host to prevent further unnecessary I/O to this device 

VMkernel.Boot.terminateVMOnPDL
Disabled by default, VM HA responses will occur according to the HA VMCP settings regardless of this setting being disabled, this setting should be considered as deprecated unless directed by VMware support

Environment

7.0
8.0

Resolution

Official HA and VMCP documentation should be followed over any advanced host settings unless directed by VMware support or storage vendor

Configure VMCP Responses

For general HA best practice, reference the following blog post, VMware support does not provide environment specific best practice

HA Deepdive

Additional Information

Documentation reference for Misc.APDHandlingEnable & Misc.APDTimeout settings

Handling Transient APD Conditions

KB reference for Disk.AutoremoveOnPDL

Disabling PDL AutoRemove feature vSphere ESXi