Following a vSAN network outage, several Linux VMs reported I/O errors on their root disk.
vSAN health: Green post-recovery.
Basic commands (uptime, date, df -h) fail due to root filesystem unavailability.
vmware.log: No heartbeat timeout or storage device errors. Only minor log discard:
YYYY-MM-DDTHH:MM:SS.SSSZ No (00) svga - >>> Error writing log, 110 bytes discarded. Disk full?
vmkernel.log: Events align with a vSAN network outage, including:
High latency warnings
Node removals from cluster membership
Leader election and failover
YYYY-MM-DDTHH:MM:SS.SSSZ In(182) vmkernel: cpu13:2100033) CMMDS: LeaderUpdateMeanRTLatency: 12423: Throttled: #-#-#-#-#: High RT latency. Node #-#-#-#-#, RT latency 958 (ms). Mean RT latency 122 (ms)
YYYY-MM-DDTHH:MM:SS.SSSZ In (182) vmkernel: cpu3:2100033) CMMDS: CMMDSCompleteLocalUpdate: 3865:#-#-#-#-#: Number of slow updates in last interval is 1 maxLatency 313 millisecs slowest #-#-#-#-#YYYY-MM-DDTHH:MM:SS.SSSZ In (182) vmkernel: cpu51:2100033) CMMDS: CMMDSCompleteLocalUpdate: 3865: 521609c9-####-####-####-0b840dldf835: Number of slow updates in last interval is 1 maxLatency 689 millisecs slowest UUID #-#-#-#-#
YYYY-MM-DDTHH:MM:SS.SSSZ In (182) vmkernel: cpu51:2100033) CMMDS: LeaderSendHeartbeat : 2635: #-#-#-#-#: Backup unresponsiveYYYY-MM-DDTHH:MM:SS.SSSZ In (182) vmkernel: cpu51:2100033) CMMDS: CMMDSStateDestroyNode : 708: #-#-#-#-#: Destroying node #-#-#-#-#: Backup is too far behindYYYY-MM-DDTHH:MM:SS.SSSZ In(182) vmkernel: cpu51:2100033) CMMDS: LeaderLostBackup: 545: #-#-#-#-#: Leader Failover: MUUID #-#-#-#-# old #-#-#-#-#YYYY-MM-DDTHH:MM:SS.SSSZ In (182) vmkernel: cpu51:2100033) CMMDS: LeaderRemoveNodeFromMembership: 8592: #-#-#-#-#: Removing node #-#-#-#-# (vsanNodeType: data) from the cluster membershipYYYY-MM-DDTHH:MM:SS.SSSZ In (182) vmkernel: cpu51:2100033) CMMDS: CMMDSClusterDestroyNodeImpl: 262: Destroying node #-#-#-#-# from the cluster db. Last HB received from node - #-#-#-#-#YYYY-MM-DDTHH:MM:SS.SSSZ Wa(180) vmkwarning: cpu51:2100033) WARNING: RDT: RDTEndQueuedMessages: 1410: assoc 0x4322c6ac1680 message 9901371 failure
#-#-#-#-#
VMware ESXi 8.x
VMware ESXi 7.x
VMware vSAN 8.x
VMware vSAN 7.x
During the vSAN network outage, backend storage became temporarily inaccessible. The Linux VMs attempted read/write operations to their root disk (/dev/sda), but commands did not complete within the configured SCSI timeout (1080s / 1880s).
XFS journal writes failed with log I/O error -5.
XFS forced a filesystem shutdown to protect data integrity.
Since /dev/sda contained the root filesystem, all VM operations became unresponsive.
[465620.448887] I/D error, dev sdb, sector 9473560 op 0x8: (READ) flags 0x0 phys_seg 1 prio class 0[465621.023491] sd 0:0:0:0: [sda] tag#1016 timing out command, waited 1880s[465621.023969] I/0 error, dev sda, sector 229747980 op 0x1: (WRITE) flags 0x9800 phys_seg 1 prio class 0[465621.024388] XFS (dm-7): log I/0 error -5[4656Z1.024020] XFS (dm-7): Filesystem has been shut down due to log crror (0x2).[4656Z1.025288] XFS (dm-7): Please unmount the filesystem and rectify the problem(s).[466700.443897] sd 0:0:1:0: [sdb] tag#1021 timing out command, waited 1080s[466700.443578] I/0 error, dev sdb, sector 9469840 op BxA: (READ) flags 0x phys_seg 1 prio class 0[522497.539017] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/#:##:####]
Confirm vSAN network and storage connectivity are stable.
Reboot affected VMs.
On restart, the XFS filesystem remounts cleanly.
VM functionality is restored.