This is a general troubleshooting procedure to help identifying if there is a problem with a physical disk in vSAN Clusters
Follow the below steps to troubleshoot disk failure in a vSAN environment :
From Web UI :
1. Check for vSAN Physical Disk status :
Inventory > Host and Clusters > vSAN Cluster > Configure > vSAN > Disk Management2. Select the affected host and then expand the view disk section. Verify the disk status and if it is reported as
UnhealthyUnmounted Permanent Disk FailureDisk DownDisk Absent
3. Check for any disk-related alarms triggered from the vSAN Skyline Health section
Inventory > Host and Clusters > vSAN Cluster > Monitor > vSAN > Skyline Health > Physical disk4. Check disk status from the affected host's Storage Devices list:
Inventory > Host and Clusters > vSAN Cluster > Affected vSAN ESXi Host > Configure > Storage > Storage Devices5. Verify if there is a Resync happening:
Inventory > Host and Clusters > vSAN Cluster > Monitor > vSAN > Resyncing ObjectsNOTE: Resync could indicate that data is being evacuated from an affected disk or disk group. Further investigation is needed to determine if the affected disk is ready to be removed or replaced.
6. Verify the status of vSAN Objects:
Inventory > Host and Clusters > vSAN Cluster > Monitor > vSAN > Skyline Health > Data > vSAN object health
From CLI :
1. Connect over SSH to the affected host and run the following commands:
# vdq -qH
2.Check on the "IsPDL" (permanent device loss) parameter. If it is equal 1, the disk is lost.
DiskResults: DiskResult[0]: Name: naa.600508b1001c4b820b4d80f9f8acfa95 VSANUUID: 5294bbd8-67c4-c545-3952-7711e365f7fa State: In-use for VSAN ChecksumSupport: 0 Reason: Non-local disk IsSSD?: 0 IsCapacityFlash?: 0 IsPDL?: 0 <<truncated>> DiskResult[18]: Name: VSANUUID: 5227c17e-ec64-de76-c10e-c272102beba7 State: In-use for VSAN ChecksumSupport: 0 Reason: None IsSSD?: 0 IsCapacityFlash?: 0 IsPDL?: 1
3. Check if there is a missing disk from the disk group.
# vdq -iH
Mappings: DiskMapping[0]: SSD: eui.6bfe4897c023247c000c2963f82a877c MD: mpx.vmhba2:C0:T1:L0 MD: mpx.vmhba2:C0:T2:L0
4. Check on the "In CMMDS" parameter. If false, then communication is lost to disk.
# esxcli vsan storage list | grep -i cmmds
In CMMDS: true In CMMDS: true In CMMDS: false
# esxcli vsan storage list | less
Device: Unknown Display Name: Unknown Is SSD: false VSAN UUID: 52bf19bd-1f9d-771b-ff4d-515281fee853 VSAN Disk Group UUID: VSAN Disk Group Name: Used by this host: false In CMMDS: false On-disk format version: 20 Deduplication: false Compression: false Checksum: Checksum OK: true Is Capacity Tier: false Encryption Metadata Checksum OK: true Encryption: false DiskKeyLoaded: false Is Mounted: true Creation Time: Wed Feb 12 22:53:23 2025
5. Check the physical location of the drive using below command :
# esxcli storage core device physical get -d <disk name>
esxcli storage core device physical get -d naa.xxxxesxcli storage core device physical get -d naa.xxxx Physical Location: enclosure 25564 slot 0 Physical Location: enclosure 25565 slot 1
6. vSAN logs to check for storage-related issues:
/var/log/vmkernel.log /var/log/vobd.log /var/log/vsandevicemonitord.log