vSAN Physical Disk Alarm: Disk Group Unhealthy and Permanent Disk Loss (PDL) Detected
search cancel

vSAN Physical Disk Alarm: Disk Group Unhealthy and Permanent Disk Loss (PDL) Detected

book

Article ID: 420830

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

  • A vSAN host reports “Disk Group Unhealthy” and “Permanent Device Loss (PDL)” alarms for one of its disk groups.
  • Skyline Health shows the disk as absent and health as Permanent disk loss also operational health shows red. 

 

Environment

  • VMware vSAN 7.x
  • VMware vSAN 8.x 

Cause

If a physical disk in a vSAN disk group experiences repeated I/O failures, vSAN’s Log Structured Object Manager (LSOM) will mark the device as unhealthy and take it out of service. When the disk group has deduplication enabled, the entire disk group will also be marked as failed.

Below events will be seen in /var/run/log/vsandevice monitor.log 

2025-12-02T02:50:29Z In(14) vsandevicemonitord[2101431]: [1083659850432]: Device naa.5002XXXXXXX state is DISKGROUP_UNDER_PDL
2025-12-02T02:50:29Z In(14) vsandevicemonitord[2101431]: [1083659850432]: Device naa.5002XXXXXXX state is DISKGROUP_UNDER_PDL
In /var/run/log/vobd.log below events will be observed.

2025-11-29T05:21:14.055Z In(14) vobd[2097812]:  [vSANCorrelator] 7820499723103us: [vob.vsan.lsom.devicerepair] vSAN device 527e0014-98f0-6e50-6e0d-4d4xxxxxxx is being repaired due to I/O failures, and will be out of service until the repair is complete. If the device is part of a dedup disk group, the entire disk group will be out of service until the repair is complete.
2025-11-29T05:21:14.055Z In(14) vobd[2097812]:  [vSANCorrelator] 7820439832915us: [esx.problem.vob.vsan.lsom.devicerepair] Device 527e0014-98f0-6e50-6e0d-4d4xxxxxxx is in offline state and is getting repaired.
2025-11-29T05:21:14.063Z In(14) vobd[2097812]:  [scsiCorrelator] 7820499731306us: [vob.scsi.scsipath.pathstate.deadver2] scsiPath vmhba0:C0:T6:L0 changed state from on (device ID: naa.500253xxxxxxx)
2025-11-29T05:21:14.064Z In(14) vobd[2097812]:  [scsiCorrelator] 7820439841838us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device naa.500xxxxxxxxx. Path vmhba0:C0:T6:L0 is down. Affected datastores: Unknown.
2025-11-29T05:21:14.064Z In(14) vobd[2097812]:  [scsiCorrelator] 7820499731324us: [vob.scsi.device.state.permanentloss] Device :naa.500253xxxxxxx has been removed or is permanently inaccessible.

Resolution

Step 1 – Place host into Maintenance Mode

  • Place the affected ESXi host in Maintenance Mode using Ensure Accessibility to minimize impact while performing remediation.

Step 2 – Remove the failed disk from vSAN

If removal through vCenter fails, CLI options can be used.

Step 3 – Replace the failed physical disk

  • Work with the hardware vendor to replace the failed disk:
  • It is good to perform diagnostics on the remaining drives in the same disk group to rule out additional hardware issues.

Step 4 – Re-add the replacement disk

Ensure that the replacement disk matches supported specifications for vSAN.

Step 5 – Validate and Exit Maintenance Mode

  • Run Skyline Health / vSAN Health Checks
  • Verify disk group is healthy
  • Confirm no resync or object health issues remain
  • Exit Maintenance Mode and return the host to production