book
Article ID: 317972
calendar_today
Updated On:
Issue/Introduction
Symptoms:
- When an APD event occurs, LUNs connected to ESXi may remain inaccessible after paths to the LUNs recover.
- The 140-second APD timeout expires even when paths to storage are recovered.
- In the /var/log/vmkernel.log file, you experience these events in sequence:
- Device enters APD.
- Device exits APD.
- Heartbeat recovery and filesystem operations on the device fail due to timeout or not found or busy.
- The APD timeout expires despite the fact that the device exited APD previously.
- This condition is associated with one or more of these behaviors:
- Virtual machines becomes inaccessible.
- Hosts becomes unresponsive.
- Storage is not online, even though paths are up and available.
- Datastore disappears from the vSphere Client, even when virtual machines on that datastore remain.
- An APD event can be triggered by one or more of these events. This list is not exhaustive:
- Failures of upstream Fibre Channel or Ethernet switches in such a way that affect all paths to the storage array
- Storage array failure or reboot
- Storage array firmware updates (some vendors)
Important: Not all APD events exhibit this behavior. In most cases, LUNs and datastores exit the APD condition normally and as expected.
Cause
This issue occurs due to a fault in APD handling. When this issue occurs, LUN paths are available and online during an APD event, but the APD timer continues upcounting until the LUN enters APD Timeout state. After the initial APD event, the datastore is inaccessible as long as active workloads are associated with the datastore.
Resolution
If you are unable to upgrade, there are no workarounds that can guarantee that this issue is not encountered during an APD event. However, there are two workarounds to restore production should this issue occur.
To work around the issue, use one of these options:
Additional Information
For more information regarding APD events, see:
Storage device has entered the All Paths Down stateAll Paths Down timeout for a storage device has expiredStorage device has recovered from the APD state连接到 VMware vSphere 6.0 主机的 LUN 在恢复路径后可能仍保持 APD 超时状态VMware vSphere 6.0 ホストに接続されている LUN がパスのリカバリ後も APD タイムアウト状態のままとなるImpact/Risks:
- When this issue is encountered, virtual machines must be terminated to recover the datastore.
- HA, if enabled, should recover these virtual machines on other hosts.
- If management agents must be restarted, the host temporarily lose manageability through vCenter Server.