Failed to place witnesses. There are currently 0 usable disks and 1 more usable disks are needed in witness node.
search cancel

Failed to place witnesses. There are currently 0 usable disks and 1 more usable disks are needed in witness node.

book

Article ID: 423036

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • The creation of a new VM or snapshot fails.

  • Third-party VM backup operations fail.

  • The issue is noticed on a two-node stretched vSAN cluster.

  • The following error is observed on the vCenter task manager.

Error : Failed to place witnesses. There are currently 0 usable disks and 1 more usable disks are needed in witness node. An error occurred while taking a snapshot: 22 (Invalid argument).

  • The witness node disk is reported as Unhealthy under:
    • Path : vSAN Cluster > Configure > vSAN > Disk management.

Environment

  • VMware vSAN 7.0.x

  • VMware vSAN 8.0.x

Cause

  • The issue is caused by a disk failure or disk performance degradation on the physical host where the vSAN Witness VM is running. This condition results in stuck I/O on the witness node disk, causing the disk to be marked offline and unhealthy.

Cause Validation:

  • VMkernel logs on the witness node report stuck I/O events for the witness disk.

Logs path on witness node: /var/run/log/vmkernel.log

YYYY-MM-DDThh:mm:ss.msZ In(182) vmkernel: cpu0:262290)ScsiDeviceIO: 13428: Task mgmt request issued to device mpx.vmhba0:C0:T1:L0 is stuck (WorldID 0, Cmd 0x2a, CmdSN 1ba30b). Issuing red notification to the application
YYYY-MM-DDThh:mm:ss.msZ In(182) vmkernel: cpu0:262290)ScsiDeviceIO: 13469: FDS_DEV_EVENT_REPORT_STUCK_IO event for device mpx.vmhba0:C0:T1:L0
YYYY-MM-DDThh:mm:ss.msZ Wa(180) vmkwarning: cpu1:262262)WARNING: PLOG: PLOG_DeviceHandleStuckIO:8812: Stuck IOs detected on vSAN device: ########-####-####-####-###########. Marking the device as OFFLINE
YYYY-MM-DDThh:mm:ss.msZ Wa(180) vmkwarning: cpu1:262262)WARNING: LSOMCommon: SSDLOG_StopReclaim:2252: Got APD event 1 on 0x450082204a48
YYYY-MM-DDThh:mm:ss.msZ Wa(180) vmkwarning: cpu1:262704)WARNING: PLOG: PLOGPropagateError:5005: Propagating stuck IO event from original device ########-####-####-####-###########
YYYY-MM-DDThh:mm:ss.msZ Wa(180) vmkwarning: cpu1:262704)WARNING: PLOG: PLOGPropagateError:5062: Stuck IO propagating from SSD to MDs in device ########-####-####-####-###########
YYYY-MM-DDThh:mm:ss.msZ In(182) vmkernel: cpu1:262704)PLOG: PLOGPropagateErrorInt:4789: disk ########-####-####-####-########### errFlag:0x200000 errEvent:13 ioStatus:I/O error/0xbad000a
YYYY-MM-DDThh:mm:ss.msZ Wa(180) vmkwarning: cpu1:262704)WARNING: PLOG: PLOGPropagateErrorInt:4796: Stuck IO state added to device ########-####-####-####-########### state=0x20000d

  • On the physical host where the witness VM resides, vmkernel logs show disk performance degradation alerts, indicating latency spikes on the underlying storage device.

Logs path on Physical server where the witness node VM resides: /var/run/log/vmkernel.log

YYYY-MM-DDThh:mm:ss.msZ Wa(180) vmkwarning: cpu2:2097196)WARNING: ScsiDeviceIO: 1779: Device naa.########-####-####-####-########### performance has deteriorated. I/O latency increased from average value of 1385 microseconds to 50329 microseconds.
YYYY-MM-DDThh:mm:ss.msZ Wa(180) vmkwarning: cpu2:2097759)WARNING: ScsiDeviceIO: 1779: Device naa.########-####-####-####-########### performance has deteriorated. I/O latency increased from average value of 1385 microseconds to 48684 microseconds.

Resolution

Perform one of the following actions:

  • Reboot the vSAN Witness VM to clear the stuck I/O condition.

OR

  • Migrate the witness node VM to a different datastore that is not affected by disk performance issues.

Additionally, work with your storage administrator to identify and resolve the underlying disk or performance issue on the physical host hosting the witness VM.

Additional Information

How to handle lost or stuck I/O on a host in vSAN cluster