vSAN Skyline Health Reports "Operational Health" alarm
search cancel

vSAN Skyline Health Reports "Operational Health" alarm

book

Article ID: 433932

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • vSAN Skyline Health Reports "Operational Health" alarm on 
    "vCenter UI > vSAN cluster > Monitor > Skyline health"

Environment

VMware vSAN (All Versions)

Cause

Problematic drive is reported to be unhealthy with unrecoverable read errors. '
 
Due to the hardware issues, storage device was unable to read data from the specified Logical Block Address (LBA).This indicates that the block is unreadable and the error could not be corrected by the device.
There is no method to prevent the logical failure of a disks blocks as SSDs degrade overtime, therefore when a failure to read is experienced, in the metadata or dedupe metadata region vSAN fails out the disk or disk group if dedupe is enabled.
 
 
 
vSAN disk group may show failed or report errors.  Medium errors with sense code "0x3 0x11 0x0"are observed.
'/var/run/log/vmkernel.log'
 
2026-03-06T20:35:45.462Z In(182) vmkernel: cpu10:2097724)ScsiDeviceIO: 4686: Cmd(0x45d900b7c440) 0x28, CmdSN 0xbae from world 0 to dev "naa.###################" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x0 Medium Error, LBA: 32252224
2026-03-06T20:35:45.813Z In(182) vmkernel: cpu10:2097724)ScsiDeviceIO: 4686: Cmd(0x45dbb2f24ec0) 0x28, CmdSN 0xcb2 from world 0 to dev "naa.###################" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x0 Medium Error, LBA: 32252224

Read and I/O errors are seen on the drive
'/var/run/log/vmkernel.log'
 
2026-03-06T20:35:32.586Z In(182) vmkernel: cpu28:2097556)PLOG: PLOGMapMetadataPartition:3224: MD naa.###################:2 is mapped with it's UUID ( ########-####-####-####-#############)
2026-03-06T20:35:46.164Z Wa(180) vmkwarning: cpu5:2098522)WARNING: PLOG: PLOGRead:4189: Throttled: xmap lookup failed #########-####-####-####-###########: Read error
2026-03-06T20:35:46.174Z Wa(180) vmkwarning: cpu8:2098585)WARNING: LSOM: LSOMMountVolumeDispatch:11860: Failed to mount volume on disk ########-####-####-####-#############: I/O error
 
Problematic drive reports read errors:
 
[root@#21:log] esxcli storage core device smart get -d naa.###################
Parameter                   Value  Threshold  Worst  Raw
--------------------------  -----  ---------  -----  ---
Health Status               OK     N/A        N/A    N/A
Media Wearout Indicator     100    0          100    5755
Read Error Count            89     6          89     1200
Power-on Hours              67     0          67     30080
Power Cycle Count           100    20         100    26
Reallocated Sector Count    98     2          98     31
Drive Temperature           72     0          53     201864839196
Write Sectors TOT Count     100    0          100    7216
Uncorrectable Sector Count  100    0          100    37152

Resolution

Engage the hardware vendor to further investigate the drive issues.
 
Below steps needs to be followed for drive replacement:
  • Place the affected ESXi host in vSAN maintenance mode with ensure accessibility.
  • Delete Disk groups on affected ESXi host.  
  • Replace the drive.
  • Re-create the Disk group
  • Add the drives back to new disk group.
  • Take the host out of maintenance mode.
  • Monitor the resync.

Additional Information