Virtual machines intermittently crash with potential hardware fault error reported in the windows event viewer logs

search cancel

Virtual machines intermittently crash with potential hardware fault error reported in the windows event viewer logs

book

Article ID: 388089

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

VMs intermittently crash at random times, with the following errors captured in the Windows Event Viewer inside the VM guest OS:

Error: services (860,R,98) A request to read from the file 'C:\WINDOWS\Security\Database\edb.log' at offset 4096 (0x0000000000001000) for 524288 (0x00080000) bytes succeeded but took an abnormally long time (23 seconds) to be serviced by the OS. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.

In the /var/log/hostd.log file, we see entries similar to:

2025-02-04T01:56:34.990Z In(166) Hostd[2098830]: [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 12544 : Lost access to volume 668fa89c-xxxxxxxx-xxxx-xxxxxxxxxxx (VMFS-test-01) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
2025-02-04T01:56:42.378Z Wa(164) Hostd[2098839]: [Originator@6876 sub=IoTracker] In thread 2098858, open("/vmfs/volumes/668fa89c-xxxxxxxx-xxxx-xxxxxxxxxxx/Datastore/VMFS-test-01.vmdk") took over 20 sec.

In the /var/log/vobd.log file, we see entries similar to:

2025-02-04T01:58:31.992Z In(14) vobd[2097766]: [vmfsCorrelator] 1718417524896us: [esx.problem.vmfs.heartbeat.timedout] 668fa89c-xxxxxxxx-xxxx-xxxxxxxxxxx VMFS-test-01
2025-02-04T01:58:56.748Z In(14) vobd[2097766]: [vmfsCorrelator] 1718442281687us: [vob.vmfs.heartbeat.recovered] Reclaimed heartbeat for volume 668fa89c-xxxxxxxx-xxxx-xxxxxxxxxxx (VMFS-test-01): [Timeout] [HB state abcdef02 offset 3538944 gen 101 stampUS 1718442281253 uuid 67873be2-xxxxxxxx-xxxx-xxxxxxxxxxxx jrnl <FB 7> drv 24.82]

Environment

VMware vSphere ESXi 6.x
VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Cause

ESXi marks a VMFS datastore offline if heartbeat I/O fails to complete within 16 seconds due to network/storage connectivity issues, high I/O workload, or scheduled tasks impacting performance.

Log entries confirm heartbeat timeouts, indicating congestion or storage/network issues, often triggered by high I/O operations like VM backups, disk optimizations, or API jobs, leading to datastore unresponsiveness

Resolution

To verify ESXi storage connectivity (network/SAN paths), optimize workloads by adjusting high I/O tasks, and engage storage vendor to check storage array logs for latency or background operations impacting I/O.

Unstable network/storage paths prevent ESXi from completing heartbeat I/Os, while high datastore traffic or resource-heavy scheduled tasks can delay responses, leading to "Lost access" alerts.

For more information on understanding "lost access to volume" messages in ESXi, please refer to this article: Understanding lost access to volume messages in ESXi

Feedback

thumb_up Yes

thumb_down No