The creation of a new VM or snapshot fails.
Third-party VM backup operations fail.
The issue is noticed on a two-node stretched vSAN cluster.
The following error is observed on the vCenter task manager.
Error : Failed to place witnesses. There are currently 0 usable disks and 1 more usable disks are needed in witness node. An error occurred while taking a snapshot: 22 (Invalid argument).
VMware vSAN 7.0.x
The issue is caused by a disk failure or disk performance degradation on the physical host where the vSAN Witness VM is running. This condition results in stuck I/O on the witness node disk, causing the disk to be marked offline and unhealthy.
VMkernel logs on the witness node report stuck I/O events for the witness disk.
Logs path on witness node: /var/run/log/vmkernel.log
YYYY-MM-DDThh:mm:ss.msZ In(182) vmkernel: cpu0:262290)ScsiDeviceIO: 13428: Task mgmt request issued to device mpx.vmhba0:C0:T1:L0 is stuck (WorldID 0, Cmd 0x2a, CmdSN 1ba30b). Issuing red notification to the applicationYYYY-MM-DDThh:mm:ss.msZ In(182) vmkernel: cpu0:262290)ScsiDeviceIO: 13469: FDS_DEV_EVENT_REPORT_STUCK_IO event for device mpx.vmhba0:C0:T1:L0YYYY-MM-DDThh:mm:ss.msZ Wa(180) vmkwarning: cpu1:262262)WARNING: PLOG: PLOG_DeviceHandleStuckIO:8812: Stuck IOs detected on vSAN device: ########-####-####-####-###########. Marking the device as OFFLINEYYYY-MM-DDThh:mm:ss.msZ Wa(180) vmkwarning: cpu1:262262)WARNING: LSOMCommon: SSDLOG_StopReclaim:2252: Got APD event 1 on 0x450082204a48YYYY-MM-DDThh:mm:ss.msZ Wa(180) vmkwarning: cpu1:262704)WARNING: PLOG: PLOGPropagateError:5005: Propagating stuck IO event from original device ########-####-####-####-###########YYYY-MM-DDThh:mm:ss.msZ Wa(180) vmkwarning: cpu1:262704)WARNING: PLOG: PLOGPropagateError:5062: Stuck IO propagating from SSD to MDs in device ########-####-####-####-###########YYYY-MM-DDThh:mm:ss.msZ In(182) vmkernel: cpu1:262704)PLOG: PLOGPropagateErrorInt:4789: disk ########-####-####-####-########### errFlag:0x200000 errEvent:13 ioStatus:I/O error/0xbad000aYYYY-MM-DDThh:mm:ss.msZ Wa(180) vmkwarning: cpu1:262704)WARNING: PLOG: PLOGPropagateErrorInt:4796: Stuck IO state added to device ########-####-####-####-########### state=0x20000d
Logs path on Physical server where the witness node VM resides: /var/run/log/vmkernel.log
YYYY-MM-DDThh:mm:ss.msZ Wa(180) vmkwarning: cpu2:2097196)WARNING: ScsiDeviceIO: 1779: Device naa.########-####-####-####-########### performance has deteriorated. I/O latency increased from average value of 1385 microseconds to 50329 microseconds.YYYY-MM-DDThh:mm:ss.msZ Wa(180) vmkwarning: cpu2:2097759)WARNING: ScsiDeviceIO: 1779: Device naa.########-####-####-####-########### performance has deteriorated. I/O latency increased from average value of 1385 microseconds to 48684 microseconds.
Perform one of the following actions:
OR
Additionally, work with your storage administrator to identify and resolve the underlying disk or performance issue on the physical host hosting the witness VM.