vSAN Skyline Health Reports "Operational Health" alarm

Products

VMware vSAN

Issue/Introduction

Symptoms:

vSAN Skyline Health Reports "Operational Health" alarm on
"vCenter UI > vSAN cluster > Monitor > Skyline health"

Environment

VMware vSAN (All Versions)

Cause

Problematic drive is reported to be unhealthy with unrecoverable read errors. '

Due to the hardware issues, storage device was unable to read data from the specified Logical Block Address (LBA).This indicates that the block is unreadable and the error could not be corrected by the device.

There is no method to prevent the logical failure of a disks blocks as SSDs degrade overtime, therefore when a failure to read is experienced, in the metadata or dedupe metadata region vSAN fails out the disk or disk group if dedupe is enabled.

vSAN disk group may show failed or report errors. Medium errors with sense code "0x3 0x11 0x0"are observed.

'/var/run/log/vmkernel.log'

2026-03-06T20:35:45.462Z In(182) vmkernel: cpu10:2097724)ScsiDeviceIO: 4686: Cmd(0x45d900b7c440) 0x28, CmdSN 0xbae from world 0 to dev "naa.###################" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x0 Medium Error, LBA: 32252224

2026-03-06T20:35:45.813Z In(182) vmkernel: cpu10:2097724)ScsiDeviceIO: 4686: Cmd(0x45dbb2f24ec0) 0x28, CmdSN 0xcb2 from world 0 to dev "naa.###################" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x0 Medium Error, LBA: 32252224

Read and I/O errors are seen on the drive

'/var/run/log/vmkernel.log'

2026-03-06T20:35:32.586Z In(182) vmkernel: cpu28:2097556)PLOG: PLOGMapMetadataPartition:3224: MD naa.###################:2 is mapped with it's UUID ( ########-####-####-####-#############)

2026-03-06T20:35:46.164Z Wa(180) vmkwarning: cpu5:2098522)WARNING: PLOG: PLOGRead:4189: Throttled: xmap lookup failed #########-####-####-####-###########: Read error

2026-03-06T20:35:46.174Z Wa(180) vmkwarning: cpu8:2098585)WARNING: LSOM: LSOMMountVolumeDispatch:11860: Failed to mount volume on disk ########-####-####-####-#############: I/O error

Problematic drive reports read errors:

[root@#21:log] esxcli storage core device smart get -d naa.###################
Parameter Value Threshold Worst Raw
-------------------------- ----- --------- ----- ---
Health Status OK N/A N/A N/A
Media Wearout Indicator 100 0 100 5755
Read Error Count 89 6 89 1200
Power-on Hours 67 0 67 30080
Power Cycle Count 100 20 100 26
Reallocated Sector Count 98 2 98 31
Drive Temperature 72 0 53 201864839196
Write Sectors TOT Count 100 0 100 7216
Uncorrectable Sector Count 100 0 100 37152

Resolution

Engage the hardware vendor to further investigate the drive issues.

Below steps needs to be followed for drive replacement:

Place the affected ESXi host in vSAN maintenance mode with ensure accessibility.
Delete Disk groups on affected ESXi host.
Replace the drive.
Re-create the Disk group
Add the drives back to new disk group.
Take the host out of maintenance mode.
Monitor the resync.

Additional Information

Troubleshooting vSAN OSA disk issues

vSAN Disk Or Diskgroup Fails With Medium Errors