vSAN-witness appliance residing on another vSAN datastore is reporting disk failures. Witness is on the same build as ESXi hosts in the stretched cluster.
VMkernel log reports witness disks are going offline:
2025-05-26T00:02:19.629Z Wa(180) vmkwarning: cpu1:438427)WARNING: LSOM: LSOMEventNotify:9026: vSAN device ########-####-####-5d94-############has gone offline.
2025-05-26T00:02:26.947Z Wa(180) vmkwarning: cpu0:438427)WARNING: LSOM: LSOMEventNotify:9026: vSAN device ########-####-####-9f23-############has gone offline.
Disks are marked PERM error:
2025-05-27T00:02:35.431Z Wa(180) vmkwarning: cpu0:262164)WARNING: HPP: HppScsiThrottleLogForDevice:585: Cmd 0x2a (0x4549014e6a40, 0) to dev "mpx.vmhba0:C0:T1:L0" on path "vmhba0:C0:T1:L0" Failed:
2025-05-27T00:02:35.431Z Wa(180) vmkwarning: cpu0:262164)WARNING: HPP: HppScsiThrottleLogForDevice:593: Error status H:0x0 D:0x8 P:0x0 Invalid sense data: 0x0 0x0 0x0. hppAction = 3
2025-05-27T00:02:35.449Z Wa(180) vmkwarning: cpu0:262164)WARNING: HPP: HppScsiThrottleLogForDevice:585: Cmd 0x2a (0x4549014e6a40, 0) to dev "mpx.vmhba0:C0:T1:L0" on path "vmhba0:C0:T1:L0" Failed:
2025-05-27T00:02:35.449Z Wa(180) vmkwarning: cpu0:262164)WARNING: HPP: HppScsiThrottleLogForDevice:593: Error status H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0. hppAction = 3
2025-05-27T00:02:35.449Z In(182) vmkernel: cpu0:262164)ScsiDeviceIO: 4672: Cmd(0x4549014e6a40) 0x2a, CmdSN 0xa810 from world 0 to dev "mpx.vmhba0:C0:T1:L0" failed H:0x7 D:0x0 P:0x0
2025-05-27T00:02:35.459Z In(182) vmkernel: cpu0:262164)PLOG: PLOGHandleTransientErrorInt:5530: Throttled: Device: ########-####-####-9f23-############ will be out of service until unmount-mount operation is complete.
2025-05-27T00:02:35.459Z In(182) vmkernel: cpu0:262164)PLOG: PLOGHandleTransientErrorInt:5549: Repair threshold (3) for device: ########-####-####-9f23-############ has been reached and will be marked as PERM error
2025-05-27T00:02:35.459Z In(182) vmkernel: cpu0:262164)LSOMCommon: IORETRYCompleteIO:469: Throttled: 0x454af73ec340 IO type 272 (WRITE) isOrdered:YES isSplit:NO isEncr:NO since 60 msec status Storage initiator error
2025-05-27T00:02:35.459Z Wa(180) vmkwarning: cpu0:262164)WARNING: LSOMCommon: SSDLOGWriteLogBlockCB:886: device: ########-####-####-9f23-############ write log block failed, blkNo 64388, type 2, segNo 35, blkSeqNo 244, writeCount 35, segRef 1643: Storage initiator
VMware vSAN 7.0.x
VMware vSAN 8.0.x
Witness VM was backed up using a snapshot based backup solution. Every time witness disk errors were reported in logs, there was snapshot request from the backup solution.
Create snapshot requests for Witness appliance:
2025-05-26T00:00:40.711Z Db(167) Hostd[2100933]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vsan:################-################/########-####-####-####-############/###########.vmx opID=1b167789-37-6ad7 sid=5257285e user=vpxuser:####\########] Create Snapshot: _cohesity_m_snapshot-##################-13112, memory=false, quiescent=false state=5
2025-05-27T00:01:17.091Z Db(167) Hostd[2100954]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vsan:################-################/########-####-####-####-############/###########.vmx opID=2d487704-d5-325d sid=5257285e user=vpxuser:####\########] Create Snapshot: _cohesity_m_snapshot-##################-13377, memory=false, quiescent=false state=5
Remove snapshot requests for Witness appliance:
2025-05-26T00:02:18.779Z Db(167) Hostd[2100945]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vsan:################-################/########-####-####-####-############/###########.vmx opID=7adb8f6c-37-6bc7 sid=5257285e user=vpxuser:####\########] Remove snapshot request received: _cohesity_m_snapshot-##################-13112, 0
2025-05-27T00:02:34.691Z Db(167) Hostd[2100936]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vsan:################-################/########-####-####-####-############/###########.vmx opID=1445d515-40-3330 sid=5257285e user=vpxuser:####\########] Remove snapshot request received: _cohesity_m_snapshot-##################-13377, 0
Exclude the witness VM from being backed up by a snapshot based backup solution.
Taking snapshots of the vSAN Witness Appliance is a high-risk action that can lead to latency issues, split-brain conditions, and vSAN metadata corruption. VMware strongly advises NOT using snapshots and instead recommends alternative backup methods.
If the vSAN Witness Appliance fails, VMware recommends deploying a fresh Witness Appliance and reconfiguring it.
For more information, check article Avoid taking a snapshot of the VMware vSAN Witness Appliance.