A capacity disk in a vSAN cluster has been identified as failing, leading to degraded performance and potential data availability risks. The disk remains "In-use" by vSAN but is triggered as "Failed" or "Unhealthy" due to consistent I/O errors and latency spikes.
vSphere Client Alerts: You may see vSAN Skyline health monitoring reports for 'Disk health' or 'Physical disk' failures.".
Performance Impact: Virtual Machines experience "stuns" or high wait times for storage I/O.
Log Entries: The following entries appear in the vmkernel.log of the affected host:
WARNING: Partition: 1387: Partition table read from device naa.################ failed: I/O error
WARNING: ScsiDeviceIO: 1779: Device naa.################performance has deteriorated. I/O latency increased... to 3462569 microseconds.2026-03-18T06:37:52.527Z In(182) vmkernel: cpu10:2098230)HPP: HppScsiLogError:329: last error status from device naa.
################repeated 2 times
2026-03-18T06:37:53.453Z Wa(180) vmkwarning: cpu21:2097956)WARNING: ScsiDeviceIO: 1779: Device naa.################performance has deteriorated. I/O latency increased from average value of 13854 microseconds to 7788911 microseconds.
vSAN OSA
The physical storage device is experiencing hardware degradation. This is evidenced by a high number of Failed Read Operations and latency spikes exceeding 3,000ms, causing the ESXi storage stack to timeout while attempting to communicate with the disk partition table.
Step 1: Locate the Physical Disk
To ensure the correct drive is replaced in the physical server, use the ESXi command line to trigger the locator LED.
Log in to the host via SSH.
Run the following command (replace the device ID with your specific identifier): esxcli storage core device set -d <naa.ID> --led-state locator --led-duration 100
Step 2: Remove the Disk from vSAN Disk Management
Before physically removing the drive, you must logically remove it from the vSAN disk group.
Navigate to the vSphere Client.
Select the Cluster > Configure > vSAN > Disk Management.
Select the host containing the failed disk.
Under Disk Groups, select the affected group and locate the failed disk.
Click Remove Disk.
Data Migration Selection: * Select No Data Migration if the disk is already failing/timed out.
Note: In cases of extreme latency, attempting "Full Data Migration" may hang the task or impact cluster performance further.
Step 3: Physical Replacement
Once the disk is removed from the UI and the locator LED is active, physically pull the drive.
Insert the replacement drive.
Return to Disk Management in the vSphere Client and use the Add Disks option to claim the new drive into the existing disk group.