vSAN Skyline health alert reports "Operation health" alarm. On clicking "Troubleshoot" for this alert, it can be seen that there are vSAN disk(s) reporting error:
vSAN disk appears as "Unmounted" Operational state on a vSAN node.(Navigate to vSphere Client > vSAN Cluster > Configure > vSAN - Disk Management > Select host affected host > Click View Disks > Expand Ineligible and unclaimed):
In some instances the device would be in the Operation State "Detached" and Claimable State "Ineligible":
The physical disk used by vSAN was detected to be faulty.
The disk state being faulty can be validated across multiple host logs -
"var/run/log/vobd.log", IO errors detected on the disk will be logged here:2025-03-03T00:30:07.975Z: [scsiCorrelator] 17381645978626us: [vob.scsi.scsipath.por] Power-on Reset occurred on naa.################2025-03-03T00:31:22.527Z: [scsiCorrelator] 17381720530080us: [vob.scsi.device.too.many.io.error] Too many errors observed for device naa.################ errPercentage 74
For NVMe devices, the below logging will be seen:2026-03-25T04:13:29.638Z In(14) vobd[2097955]: [psastorCorrelator] 2477826311967us: [vob.psastor.device.too.many.io.error] Too many errors observed for device t10.NVMe____Dell_Ent_NVMe_P5500_RI_U.2_7.68TB_______000############# errPercentage 100
In the "var/run/log/vmkwarning.log", log entries similar to:2025-03-12T05:08:24.209Z cpu67:2101847 opID=3dbce423)WARNING: ScsiDeviceIO: 12155: READ CAPACITY on device "naa.################" from Plugin "HPP" failed. I/O error2025-03-12T05:08:33.531Z cpu9:2104069)WARNING: ScsiDeviceIO: 12155: READ CAPACITY on device "naa.################" from Plugin "HPP" failed. I/O error
In the "var/run/log/vmkernel.log", log entries similar to:2025-03-12T06:19:59.910Z cpu5:2101934 opID=de30a3c2)WARNING: ScsiDeviceIO: 12155: READ CAPACITY on device "naa.################" from Plugin "HPP" failed. I/O error2025-03-12T06:19:59.910Z cpu2:34236323)ScsiDevice: 612: Could not flush cache of local device naa.################. Failure
To replace the faulty disk:
Place the host with the absent disk in maintenance mode with "Ensure accessibility".
Engage the hardware vendor and get the failed disk replaced physically in the server.
Then depending on type of failed disks (cache or capacity) and if deduplication is enabled or not, follow the below steps to replace the new drive:
If deduplication is enabled on the cluster or if the absent disk was a cache device:
Delete the disk group containing the absent vSAN disk.
Re-create the disk group with the existing disks and the new disk.
If deduplication is not enabled or if the absent disk was a capacity device:
Remove the absent vSAN disk from the disk group.
Add the new disk to the disk group.
For more details refer Dying Disk Handling (DDH) in vSAN.