If I/O is stuck or lost on the storage controller or the storage disk, the ESXi storage stack will try to abort them using the task management request. If such a lost I/O is found on a host, vSAN will offline the disk to ensure that it doesn't affect other hosts on the cluster. If the cache device in non-dedup disk group encounters stuck I/O or if any of the disk in dedup disk group encounters stuck I/O, the entire disk group will be set to offline state. As a resolution, user need migrate the workload and power cycle the host. After power cycle of the host, collect the vm-support along with driver/firmware logs. These issues are seen due to faulty hardware or firmware bugs. The customer needs to open a case with the hardware vendor by collecting the hardware ( storcli and/or sascli logs) logs. Please refer to How to handle lost or stuck I/O on a host in vSAN cluster for more information.
vSAN reports this issue when it detects a potential stuck I/O (.i.e, the I/O exceeds a time out period), which might lead to a stuck I/O scenario. No immediate action required for this disk if the issue only appears once and is resolved. If it leads to a stuck I/O, please refer to How to handle lost or stuck I/O on a host in vSAN cluster for more information.
vSAN reports this issue when SMART Impending failures are reported by the disk. This disk/disk group will be evacuated and permanently unmounted and the customer needs to replace the problematic disk.
vSAN reports this issue when it detects excessive high log congestion on this disk group. vSAN will evacuate the data on the disk group and remount it. If the same issue occurs again in a week then vSAN will evacuate and rebuild the disk group. No action is needed from the user.
Impending permanent disk failure are same as SMART disk failure.
In case vSAN detects high latency from the disk then vSAN will evacuate the data from disk/disk group and permanently unmount the disk/disk group and the customer needs to replace this problematic disk.
vSAN reports this issue when the vSAN metadata read encounters an unrecoverable read error from the disk. vSAN will evacuate the data from the disk/disk group and will rebuild it.
vSAN reports this status when internal software(i.e. LSOM meta flusher in disk) is stuck. We recommend you migrate the workload and power cycle the host.
vSAN reports this status when internal software(i.e. PLOG elevator in disk) is stuck. We recommend you migrate the workload and power cycle the host.
vSAN reports this issue when the disk fails to rebuild during disk remediation. To diagnose and remediate the issue, check vmkernel.log in /var/run/log/, and search the disk name. If the logs are unclear, promptly contact VMware Support and collect support bundles.
vSAN reports this issue when the disk fails to unmount during disk remediation. To diagnose and remediate the issue, check vmkernel.log in /var/run/log/, and search the disk name. If the logs are unclear, promptly contact VMware Support and collect support bundles.
Also, see:
See KB vSAN Skyline Health Check Information for a complete list of vSAN Skyline Health checks