vSAN Physical Disk Alarm: Disk Group Unhealthy and Permanent Disk Loss (PDL) Detected

search cancel

vSAN Physical Disk Alarm: Disk Group Unhealthy and Permanent Disk Loss (PDL) Detected

book

Article ID: 420830

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

A vSAN host reports “Disk Group Unhealthy” and “Permanent Device Loss (PDL)” alarms for one of its disk groups.
Skyline Health shows the disk as absent and health as Permanent disk loss also operational health shows red.

Environment

VMware vSAN 7.x
VMware vSAN 8.x

Cause

If a physical disk in a vSAN disk group experiences repeated I/O failures, vSAN’s Log Structured Object Manager (LSOM) will mark the device as unhealthy and take it out of service. When the disk group has deduplication enabled, the entire disk group will also be marked as failed.

Below events will be seen in /var/run/log/vsandevice monitor.log 

2025-12-02T02:50:29Z In(14) vsandevicemonitord[2101431]: [1083659850432]: Device naa.5002XXXXXXX state is DISKGROUP_UNDER_PDL
2025-12-02T02:50:29Z In(14) vsandevicemonitord[2101431]: [1083659850432]: Device naa.5002XXXXXXX state is DISKGROUP_UNDER_PDL

In /var/run/log/vobd.log below events will be observed.

2025-11-29T05:21:14.055Z In(14) vobd[2097812]:  [vSANCorrelator] 7820499723103us: [vob.vsan.lsom.devicerepair] vSAN device 527e0014-98f0-6e50-6e0d-4d4xxxxxxx is being repaired due to I/O failures, and will be out of service until the repair is complete. If the device is part of a dedup disk group, the entire disk group will be out of service until the repair is complete.
2025-11-29T05:21:14.055Z In(14) vobd[2097812]:  [vSANCorrelator] 7820439832915us: [esx.problem.vob.vsan.lsom.devicerepair] Device 527e0014-98f0-6e50-6e0d-4d4xxxxxxx is in offline state and is getting repaired.
2025-11-29T05:21:14.063Z In(14) vobd[2097812]:  [scsiCorrelator] 7820499731306us: [vob.scsi.scsipath.pathstate.deadver2] scsiPath vmhba0:C0:T6:L0 changed state from on (device ID: naa.500253xxxxxxx)
2025-11-29T05:21:14.064Z In(14) vobd[2097812]:  [scsiCorrelator] 7820439841838us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device naa.500xxxxxxxxx. Path vmhba0:C0:T6:L0 is down. Affected datastores: Unknown.
2025-11-29T05:21:14.064Z In(14) vobd[2097812]:  [scsiCorrelator] 7820499731324us: [vob.scsi.device.state.permanentloss] Device :naa.500253xxxxxxx has been removed or is permanently inaccessible.

Resolution

Step 1 – Place host into Maintenance Mode

Place the affected ESXi host in Maintenance Mode using Ensure Accessibility to minimize impact while performing remediation.

Step 2 – Remove the failed disk from vSAN

Remove the affected device from the disk group following this KB:
How to remove a disk from a vSAN disk group/host

If removal through vCenter fails, CLI options can be used.

Step 3 – Replace the failed physical disk

Work with the hardware vendor to replace the failed disk:
It is good to perform diagnostics on the remaining drives in the same disk group to rule out additional hardware issues.

Step 4 – Re-add the replacement disk

After replacement, add the new disk back to the disk group as per:
Add Devices to the Disk Group in vSAN Cluster

Ensure that the replacement disk matches supported specifications for vSAN.

Step 5 – Validate and Exit Maintenance Mode

Run Skyline Health / vSAN Health Checks
Verify disk group is healthy
Confirm no resync or object health issues remain
Exit Maintenance Mode and return the host to production

Feedback

thumb_up Yes

thumb_down No