Difference between APD (All Paths Down) and 'APD Notify' in vSAN
search cancel

Difference between APD (All Paths Down) and 'APD Notify' in vSAN

book

Article ID: 418773

calendar_today

Updated On:

Products

VMware vSAN VMware vSAN 7.x VMware vSAN 8.x VMware vSAN 6.x

Issue/Introduction

Cache device in a vSAN diskgroup might report 'APD Notify' - condition indicating a trigger different from 'regular' 'All Paths Down' scenario.

'APD Notify' can be validated by confirming presence of 'APD Notify PERM LOSS' in /var/run/log/vmkernel.log of ESXi host in question, example:

In(182) vmkernel: cpu36:2097645)StorageDevice: 10570: Device t10.NVMe____vendor_type_type_NVMe_model_xxxxxxxxxxxxxxxx APD Notify PERM LOSS; token num:1

This article aims at helping with determination of correct physical device requiring attention.

Environment

vSAN 8.x OSA, vSAN 9.x OSA

Resolution

When using VMware vSAN with deduplication enabled, any disk failure will result in the failure of the entire disk group it belongs to, reference: Identifying and replacing a failed cache or capacity disk in vSAN OSA disk group when vSAN deduplication is enabled
In case of capacity storage device experiencing either APD or PDL (Permanent Device Loss), the 'APD Notify' status is declared for fronting cache device, with aim to fence/offline entire diskgroup. This is expected behavior. 

Determination of the capacity device that requires attention (storage device from which 'APD Notify' is propagated from) further review of /var/run/log/vmkernel.log is advised. Problematic device will be accompanied by 'error for' rather than 'error propagated for' messages, example:

In(182) vmkernel: cpu5:52401090)LSOM: LSOMLogDiskEvent:8418: Disk Event permanent error propagated for SSD xxxxxxxx-6b02-0502-xxxx-xxxxxxxxxxxx (t10.NVMe____vendor_type_type_NVMe_model_xxxxxxxxxxxxxxxx:2)
In(182) vmkernel: cpu19:52401090)LSOM: LSOMLogDiskEvent:8418: Disk Event permanent error propagated for MD xxxxxxxx-ae2a-c86e-xxxx-xxxxxxxxxxxx (naa.xxxxxxxxxxxxxx31:2)
In(182) vmkernel: cpu19:52401090)LSOM: LSOMLogDiskEvent:8418: Disk Event permanent error for MD xxxxxxxx-b4c0-bf46-xxxx-xxxxxxxxxxxx (naa.xxxxxxxxxxxxxx33:2)
In(182) vmkernel: cpu19:52401090)LSOM: LSOMLogDiskEvent:8418: Disk Event permanent error propagated for MD xxxxxxxx-8778-5122-xxxx-xxxxxxxxxxxx (naa.xxxxxxxxxxxxxx35:2)

'APD notify' messages seen in the same log:

In(182) vmkernel: cpu36:2097645)StorageDevice: 10570: Device t10.NVMe____vendor_type_type_NVMe_model_xxxxxxxxxxxxxxxx APD Notify PERM LOSS; token num:1

...do not indicate a problem with fronting cache device. 

Additional Information

All Paths Down for a storage device https://knowledge.broadcom.com/external/article?articleNumber=318850 

Identifying and replacing a failed cache or capacity disk in vSAN OSA disk group when vSAN deduplication is enabled https://knowledge.broadcom.com/external/article/327008/vsan-deduplication-enabled-identifying.html