A capacity disk in a vSAN OSA (Hybrid) disk group is reporting an impending failure and was marked first offline, then as a Permanent Error (PERM).
Disk mapping output (vdq -iH):
DiskMapping[0]: SSD: naa.50000######b800 MD: naa.50000######aba5 MD: naa.50000######ac15 MD: naa.50000######abc1 MD: naa.50000######ab51 MD: naa.50000######ac35
Log entries show the device transitioned to PERM error:
/var/log/vmkernel.log
2025-10-01T16:46:19.614Z In(182) vmkernel: cpu56:2100228)LSOM: LSOMLogDiskEvent:8418: Disk Event permanent error for MD 52###9fd-####-####-####-d7e####2fcf3 (naa.50000######abc5:2)2025-10-01T16:46:19.614Z Wa(180) vmkwarning: cpu56:2100228)WARNING: LSOM: LSOMEventNotify:8891: vSAN device 52###9fd-####-####-####-d7e####2fcf3 is under permanent error.
SMART data confirms the disk is in impending failure state:
esxcli storage core device smart get -d naa.50000######abc5
SMART Data for Disk : naa.50000######abc5Parameter Value Threshold Worst Raw-----------------------------------------------------------Health Status IMPENDING FAILURE N/A N/A N/AWrite Error Count 0 N/A N/A N/ARead Error Count 504 N/A N/A N/APower Cycle Count 50 N/A N/A N/ADrive Temperature 19 N/A N/A N/A------------------------------------------------------------
VMware vSAN 8.x
VMware vSAN OSA (Hybrid)
The capacity device naa.50000######abc5 encountered repeated read errors and medium errors (bad sectors).
vSAN attempted multiple retries and repair operations, but the retry threshold was exceeded.
Device was first marked offline, then escalated to Permanent Error (PERM).
SMART monitoring reports the drive in impending failure.
vSAN automatically initiated data evacuation to protect against data loss.
Medium Errors (Read Failures)
Command Failures & Timeouts
2025-09-30T16:26:12.316Z Wa(180) vmkwarning: cpu1:2098534)WARNING: HPP: HppScsiThrottleLogForDevice:585: Cmd 0x28 (0x45b######b80, 0) to dev "naa.50000######abc5" on path "vmhba0:C0:T5:L0" Failed:2025-09-30T16:26:12.316Z Wa(180) vmkwarning: cpu1:2098534)WARNING: HPP: HppScsiThrottleLogForDevice:593: Error status H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x1. hppAction = 12025-09-30T16:26:12.316Z In(182) vmkernel: cpu1:2098534)ScsiDeviceIO: 4686: Cmd(0x45b######b80) 0x28, CmdSN 0xb860211 from world 0 to dev "naa.50000######abc5" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x1 Medium Error, LBA: 102####576
I/O Errors on Partition Reads
2025-09-30T16:26:49.147Z In(182) vmkernel: cpu2:2116277 opID=da3a69f2)Partition: 477: Failed read for "naa.50000######abc5": I/O error2025-09-30T16:26:49.147Z In(182) vmkernel: cpu2:2116277 opID=da3a69f2)Partition: 1205: Failed to read protective mbr on "naa.50000######abc5" : I/O error2025-09-30T16:26:49.147Z Wa(180) vmkwarning: cpu2:2116277 opID=da3a69f2)WARNING: Partition: 1387: Partition table read from device naa.50000######abc5 failed: I/O error2025-09-30T16:26:49.147Z In(182) vmkernel: cpu2:2116277 opID=da3a69f2)ScsiDeviceIO: 6478: Command 0x1a (CmdSN 0x36###49, World 0) to device naa.50000######abc5 timed out: expiry time occurs 3ms in the past2025-09-30T16:26:49.147Z Wa(180) vmkwarning: cpu2:2116277 opID=da3a69f2)WARNING: ScsiDeviceIO: 6723: Failed to issue command (0x1a) on device naa.50000######abc5: Timeout
vSAN Disk Repair Process
2025-09-30T16:27:10.948Z In(182) vmkernel: cpu43:16728488)PLOG: PLOGHandleTransientErrorInt:5530: Throttled: Device: 52###9fd-####-####-####-d7e######cf3 will be out of service until unmount-mount operation is complete.2025-09-30T16:27:10.948Z Wa(180) vmkwarning: cpu43:16728488)WARNING: PLOG: PLOGHandleTransientErrorInt:5612: vSAN device 52###9fd-####-####-####-d7e######cf3 is being repaired due to I/O failures, and will be out of service until the repair is complete. If the devi$2025-09-30T16:27:10.948Z In(182) vmkernel: cpu43:16728488)LSOMCommon: IORETRYCompleteIO:469: Throttled: 0x45e#####7900 IO type 16648 (READ) isOrdered:NO isSplit:YES isEncr:YES since 60001 msec status Maximum kernel-level retries exceeded
Latency Spikes
2025-09-30T16:41:43.666Z In(182) vmkernel: cpu55:2100228)LSOM: LSOMNamespaceCheckLatency:398: Throttled: Latency 523bd9fd-####-####-####-d7e######cf3 1 18:26:##:##:#:#:0:0:12025-09-30T16:41:43.666Z In(182) vmkernel: cpu55:2100228)LSOM: LSOMNamespaceCheckLatency:428: Throttled: LatencyCum 523bd9fd-####-####-####-d7e######fcf3 1 31###81:242##93:22##989:491##96:32##23:14##6:28:1:3
2025-09-30T16:41:53.265Z Wa(180) vmkwarning: cpu35:2098536)WARNING: ScsiDeviceIO: 1780: Device naa.50000######abc5 performance has deteriorated. I/O latency increased from average value of 4994 microseconds to 956638 microseconds.2025-09-30T16:41:53.271Z In(182) vmkernel: cpu5:2098530)ScsiDeviceIO: 1780: Device naa.50000######abc5 performance has improved. I/O latency reduced from 956638 microseconds to 13441 microseconds.
Permanent Device Error
2025-10-01T16:46:19.614Z In(182) vmkernel: cpu29:17011586)PLOG: PLOGHandleTransientErrorInt:5549: Repair threshold (3) for device: 52####fd-####-####-####-d7e######cf3 has been reached and will be marked as PERM error2025-10-01T16:46:19.614Z Wa(180) vmkwarning: cpu1:2099350)WARNING: PLOG: PLOGPropagateErrorInt:4915: vSAN device 523bd9fd-####-####-####-d7e######cf3 is under permanent error.2025-10-01T16:46:19.614Z In(182) vmkernel: cpu56:2100228)LSOM: LSOMLogDiskEvent:8418: Disk Event permanent error for MD 52####fd-####-####-####-d7e######cf3 (naa.50000######abc5:2)2025-10-01T16:46:19.614Z Wa(180) vmkwarning: cpu56:2100228)WARNING: LSOM: LSOMEventNotify:8891: vSAN device 523bd9fd-####-####-####-d7e######cf3 is under permanent error.
SMART Impending Failure
naa.50000######abc5 smart health status is impending failure. It will be evacuated and unmounted, consider replacing it.]2025-10-02T05:08:54.611Z In(14) vobd[2097763]: [vSANCorrelator] 4657######206us: [esx.problem.vob.vsan.lsom.devicewithsmartfailure] vSAN device naa.50000######abc5 smart health status is impending failure. It will be evacuated and unmounted, consider replacing it.
Replace the failing disk after taking host into maintenance mode (safest approach)
Note : Ensure data evacuation has completed before removing the drive physically.
After replacement, add the new disk to the vSAN disk group to restore capacity and redundancy.
If deduplication and compression are enabled,