esxcli vsan health cluster get -t 'Operation health'
Operation health redHost Disk Overall health Metadata health Operational health In CMMDS/VSI OperationalState Description Recommendation UUID
HOSTNAME Disk(xxx) red red red Yes /Yes Stuck I/O is detected Migrate workload & power cycle host
HOSTNAME Disk(xxx) red red red Yes /Yes Stuck I/O is detected Migrate workload & power cycle host
Virtual machine vmware.log indicates the VM crash event
"The guest has requested that the virtual machine be hard reset."
vSAN reports heartbeat timeout events for VM namespaces.
vobd.log :
[vmfsCorrelator] 2076401194729us: [esx.problem.vmfs.heartbeat.timedout] 665f47ee-########-a632-############ ee475f66-####-####-5d58-########### [vmfsCorrelator] 2076411926527us: [esx.problem.vmfs.heartbeat.recovered] 665f47ee-########-a632-############ ee475f66-####-####-5d58-###########
hostd.log :
[Originator@6876 sub=Vimsvc.ha-eventmgr] Event 17714 : Issue detected on Test01 in ha-datacenter: smartpqi01: pqisrc_wait_on_condition:0246: Controller is Offline
YYYY-MM-DDThh:mm:ss.000Z cpu30:1001397101)ScsiDeviceIO: PsaScsiDeviceTimeoutHandlerFn:12834: TaskMgmt op to cancel IO succeeded for device naa.######## and the IO did not complete. WorldId 0, Cmd 0x28, CmdSN = 0x428.Cancelling of IO will be
YYYY-MM-DDThh:mm:ss.000Z cpu30:1001397101)retried.YYYY-MM-DDThh:mm:ss.000Z: [vSANCorrelator] 19607827057us: [vob.vsan.lsom.stuckiooffline] vSAN device ########-########-####-####-####-############ detected stuck I/O error. Marking the device as offline.
YYYY-MM-DDThh:mm:ss.000Z: [vSANCorrelator] 19607829404us: [esx.problem.vob.vsan.lsom.stuckiooffline] vSAN device ########-########-####-####-####-############ detected stuck I/O error. Marking the device as offlineYYYY-MM-DDThh:mm:ss.000Z: [vSANCorrelator] 19607827040us: [vob.vsan.lsom.stuckiopropagated] vSAN device ########-########-####-####-####-############ is under propagated stuck I/O error. Marking the device as offline.
YYYY-MM-DDThh:mm:ss.000Z: [vSANCorrelator] 19607828405us: [esx.problem.vob.vsan.lsom.stuckiopropagated] vSAN device ########-########-####-####-####-############ is under propagated stuck I/O error. Marking the device as offline.