Following a vSAN network outage, several Linux VMs reported I/O errors on their root disk.
vSAN health: Green post-recovery.
Basic commands (uptime, date, df -h) fail due to root filesystem unavailability.
vmware.log: No heartbeat timeout or storage device errors. Only minor log discard:
2025-09-16T10:49:41.604Z No (00) svga - >>> Error writing log, 110 bytes discarded. Disk full?
vmkernel.log: Events align with a vSAN network outage, including:
High latency warnings
Node removals from cluster membership
Leader election and failover
2025-09-15T15:45:07.263Z In(182) vmkernel: cpu13:2100033) CMMDS: LeaderUpdateMeanRTLatency: 12423: Throttled: 521609c9-####-####-####-0b840dldf835: High RT latency. Node 65189716-####-####-####-1423f2320bb0, RT latency 958 (ms). Mean RT latency 122 (ms)
2025-09-15T15:45:18.065Z In (182) vmkernel: cpu3:2100033) CMMDS: CMMDSCompleteLocalUpdate: 3865:521609c9-####-####-####-0b840dldf835: Number of slow updates in last interval is 1 maxLatency 313 millisecs slowest UUID529f964e-####-####-####-b57clle7b3c52025-09-15T15:45:18.755Z In (182) vmkernel: cpu51:2100033) CMMDS: CMMDSCompleteLocalUpdate: 3865: 521609c9-####-####-####-0b840dldf835: Number of slow updates in last interval is 1 maxLatency 689 millisecs slowest UUID529f964e-####-####-####-b57clle7b3c5
2025-09-15T15:45:43.982Z In (182) vmkernel: cpu51:2100033) CMMDS: LeaderSendHeartbeat : 2635: 521609c9-####-####-####-0b840dldf835: Backup unresponsive2025-09-15T15:45:43.982Z In (182) vmkernel: cpu51:2100033) CMMDS: CMMDSStateDestroyNode : 708: 521609c9-####-####-####-0b840dldf835: Destroying node 65189716-####-####-####-1423f2320bb0: Backup is too far behind2025-09-15T15:45:43.982Z In(182) vmkernel: cpu51:2100033) CMMDS: LeaderLostBackup: 545: 521609c9-####-####-####-0b840dldf835: Leader Failover: MUUID a734c868-####-####-####-1423f231d820 old 326cb968-####-####-####-1423f231d8202025-09-15T15:45:43.982Z In (182) vmkernel: cpu51:2100033) CMMDS: LeaderRemoveNodeFromMembership: 8592: 521609c9-####-####-####-0b840dldf835: Removing node 65189716-####-####-####-1423f2320bb0 (vsanNodeType: data) from the cluster membership2025-09-15T15:45:43.982Z In (182) vmkernel: cpu51:2100033) CMMDS: CMMDSClusterDestroyNodeImpl: 262: Destroying node 65189716-####-####-####-1423f2320bb0 from the cluster db. Last HB received from node - 9732466285892122025-09-15T15:45:43.982Z Wa(180) vmkwarning: cpu51:2100033) WARNING: RDT: RDTEndQueuedMessages: 1410: assoc 0x4322c6ac1680 message 9901371 failure
VMware ESXi 8.x
VMware ESXi 7.x
VMware vSAN 8.x
VMware vSAN 7.x
During the vSAN network outage, backend storage became temporarily inaccessible. The Linux VMs attempted read/write operations to their root disk (/dev/sda), but commands did not complete within the configured SCSI timeout (1080s / 1880s).
XFS journal writes failed with log I/O error -5.
XFS forced a filesystem shutdown to protect data integrity.
Since /dev/sda contained the root filesystem, all VM operations became unresponsive.
[465620.448887] I/D error, dev sdb, sector 9473560 op 0x8: (READ) flags 0x0 phys_seg 1 prio class 0[465621.023491] sd 0:0:0:0: [sda] tag#1016 timing out command, waited 1880s[465621.023969] I/0 error, dev sda, sector 229747980 op 0x1: (WRITE) flags 0x9800 phys_seg 1 prio class 0[465621.024388] XFS (dm-7): log I/0 error -5[4656Z1.024020] XFS (dm-7): Filesystem has been shut down due to log crror (0x2).[4656Z1.025288] XFS (dm-7): Please unmount the filesystem and rectify the problem(s).[466700.443897] sd 0:0:1:0: [sdb] tag#1021 timing out command, waited 1080s[466700.443578] I/0 error, dev sdb, sector 9469840 op BxA: (READ) flags 0x phys_seg 1 prio class 0[522497.539017] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/#:##:####]
Confirm vSAN network and storage connectivity are stable.
Reboot affected VMs.
On restart, the XFS filesystem remounts cleanly.
VM functionality is restored.