Multiple ESXi hosts within a vSAN cluster intermittently log events indicating they have lost and subsequently recovered access to storage volumes.
The vSphere Client and vCenter events display repeated connectivity warnings:
Lost access to volume 5acfc497-633bf9de-b5c0-############ (97c4cf5a-dd53-07b1-d372-############) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly. Information 05/23/2025, 5:19:38 AM Host1
Successfully restored access to volume 5acfc497-633bf9de-b5c0-############ (97c4cf5a-dd53-07b1-d372-############) following connectivity issues. Information 05/23/2025, 5:19:40 AM Host1
The vmkernel.log on the affected hosts records high latency and VMFS heartbeat timeouts:
2025-05-18T11:53:01.588Z CMMDS: LeaderUpdateMeanRTLatency:12333: Throttled: 529c7cd4-6a43-ab4c-85b8-############: High RT latency. Node 00000000-0000-0000-0000-############, RT latency 5382(ms). Mean RT latency 337(ms)
2025-05-18T11:53:16.641Z HBX: 5765: Reclaiming HB at 3645440 on vol '6283e45a-0cf0-c643-6ca0-############' replayHostHB: 0 replayHostHBgen: 0 replayHostUUID: (00000000-00000000-0000-000000000000).
2025-05-18T11:53:16.643Z HBX: 294: '6283e45a-0cf0-c643-6ca0-############': HB at offset 3645440 - Reclaimed heartbeat [Timeout]:
The vobd.log confirms the heartbeat timeout issues across multiple nodes:
2025-05-18T11:53:05.437Z: [vob.vmfs.heartbeat.timedout] 5ae48362-736aefde-ea80-############ 6283e45a-0cf0-c643-6ca0-############
2025-05-18T11:53:05.437Z: [esx.problem.vmfs.heartbeat.timedout] 5ae48362-736aefde-ea80-############ 6283e45a-0cf0-c643-6ca0-############
2025-05-18T11:53:16.643Z: [vob.vmfs.heartbeat.recovered] Reclaimed heartbeat for volume 5ae48362-736aefde-ea80-############ (6283e45a-0cf0-c643-6ca0-############)
2025-05-18T11:53:16.644Z: [esx.problem.vmfs.heartbeat.recovered] 5ae48362-736aefde-ea80-############ 6283e45a-0cf0-c643-6ca0-############
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
VMware VSAN (All Versions)
Physical network infrastructure instability within the datacenter is causing packet loss, frame length errors, and CRC errors. This drops vSAN and storage heartbeat traffic between the hosts, directly resulting in the datastore accessibility timeouts and high latency logged by ESXi.
esxcli network nic stats get -n vmnic#For more information regarding vSAN network requirements and troubleshooting physical network statistics, refer to the Broadcom TechDocs: Troubleshooting the vSAN Network