Introduction:
Lost or wedged I/O is an I/O which is stuck outside of ESXi (device controller/firmware) that does not complete and doesn’t respond to abort and/or abort never completes.
Since the I/O is stuck outside of ESXi, the only option ESXi has is to send an abort. If the device/controller doesn’t respond to the abort within 120 seconds (default timeout) vSAN will take the disk/Disk Group to offline state to avoid affecting the entire vSAN cluster.
Examples of Symptoms:

 

 
Check for Skyline Health Alarm "Operation Health" via SSH/Putty to any of the vSAN Hosts: 
 
Run the following command:
esxcli vsan health cluster get -t 'Operation health'
 
Example of output: 
Operation health red
Host      Disk      Overall health Metadata health Operational health In CMMDS/VSI OperationalState Description Recommendation UUID
HOSTNAME  Disk(xxx) red            red             red                Yes     /Yes                  Stuck I/O is detected Migrate workload & power cycle host
HOSTNAME  Disk(xxx) red            red             red                Yes     /Yes                  Stuck I/O is detected Migrate workload & power cycle host
 
 
Logs:
If I/O is stuck or lost on the storage controller or the storage disk, the ESXi storage stack will try to abort them using the task management request displaying these console messages:
YYYY-MM-DDThh:mm:ss.000Z cpu30:1001397101)ScsiDeviceIO: PsaScsiDeviceTimeoutHandlerFn:12834: TaskMgmt op to cancel IO succeeded for device naa.######## and the IO did not complete. WorldId 0, Cmd 0x28, CmdSN = 0x428.Cancelling of IO will be
YYYY-MM-DDThh:mm:ss.000Z cpu30:1001397101)retried.
 
If such a lost I/O is found on a host, vSAN will offline the disk to ensure that it doesn't affect other hosts on the cluster as seen in /var/run/log/vobd.log:
YYYY-MM-DDThh:mm:ss.000#### detected stuck I/O error. Marking the device as offline.
YYYY-MM-DDThh:mm:ss.000#### detected stuck I/O error. Marking the device as offline
 
When Deduplication is not enabled: If the Cache Tier encounters stuck I/O the entire Disk Group it manages will be set to offline state.
When Deduplication is enabled: If stuck I/O is detected on a disk, the entire Disk Group it manages will be set to offline state.
YYYY-MM-DDThh:mm:ss.000####
YYYY-MM-DDThh:mm:ss.000#### is under propagated stuck I/O error. Marking the device as offline.