Virtual Machine crashes with a Blue Screen of Death (BSOD) and loss of storage volume access resulting from storage latency and connectivity failures.
search cancel

Virtual Machine crashes with a Blue Screen of Death (BSOD) and loss of storage volume access resulting from storage latency and connectivity failures.

book

Article ID: 392634

calendar_today

Updated On:

Products

VMware vSAN VMware vSphere ESXi

Issue/Introduction

A Windows-based virtual machine (VM) unexpectedly crashes with a Blue screen of death(BSOD). This typically occurs following a period of storage instability. Reviewing the ESXi logs reveals the following indicators:

1. Storage latency and I/O errors (/var/run/log/hostd.log)

Significant delays in basic file operations (e.g., opening the.vmx file) and subsequent I/O errors when storage becomes unresponsive.

[YYYY-MM-DDTHH:MM:SS] warning hostd[2104418] [Originator@6876 sub=IoTracker] In thread 2927571, fopen("/vmfs/volumes/<datastore>/<VMName>/Virtual-Machine.vmx") took over 140 sec.
[YYYY-MM-DDTHH:MM:SS] warning hostd[2285231] [Originator@6876 sub=IoTracker] In thread 2102958, fopen("/vmfs/volumes/<datastore>/<VMName>/Virtual-Machine.vmx") took over 76 sec.
.
.
[YYYY-MM-DDTHH:MM:SS] info hostd[2102535] [Originator@6876 sub=Libs opID=########-########-####-####-h5:########-##-##-##-####] DictionaryLoad: Cannot open file "/vmfs/volumes/<datastore>/<VMName>/Virtual-Machine.vmx": Input/output error.
[YYYY-MM-DDTHH:MM:SS] info hostd[2102535] [Originator@6876 sub=Libs opID=########-########-####-####-h5:########-##-##-##-####] VigorOffline_GenSecPolicy: retry reading /vmfs/volumes/<datastore>/<VMName>/Virtual-Machine.vmx

2. Heartbeat timeouts and object liveness (/var/run/log/vmkernel.log)

The ESXi kernel reports loss of communication with storage objects. Critical I/O errors are logged when the kernel can no longer access the .vmx or virtual disk files.

[YYYY-MM-DDTHH:MM:SS] cpu4:17153319)HBX: 294: '########-####-####-############: HB at offset ####### - Reclaimed heartbeat [Timeout]:
[YYYY-MM-DDTHH:MM:SS] cpu4:17153319)  [HB state abcdef02 offset 3248128 gen 241 stampUS ######## uuid ########-####-####-############ jrnl <FB 796920> drv 14.81 lockImpl 4 ip ##.###.###.###]
.
.
[YYYY-MM-DDTHH:MM:SS] cpu71:2098688)DOM: DOMOwner_SetLivenessState:7358: Object ########-####-####-####-############ lost liveness [########]
[YYYY-MM-DDTHH:MM:SS] cpu66:2098672)DOM: DOMOwner_SetLivenessState:7358: Object ########-####-####-####-############ lost liveness [########]

3. All Paths Down (APD) and connectivity restoration events (/var/run/log/vobd.log)

The storage stack enters an APD state, indicating all communication paths to the storage device or filesystem have been lost and later eventually recovering after the underlying storage connection is restored.

[YYYY-MM-DDTHH:MM:SS] In(14) vobd[2097765]: [APDCorrelator] ##############: [esx.problem.storage.apd.start] Device or filesystem with identifier [#######-#######] has entered the All Paths Down state.
.
.
[YYYY-MM-DDTHH:MM:SS] In(14) vobd[2097765]: [vmfsCorrelator] ############: [vob.vmfs.nfs.server.restored] Restored connection to the server ###.###.#.### mount point /<DatastoreName>, mounted as #-#-#-# ("<Datastore Name>")

4. Guest OS blue screen MSR faults (/vmfs/volummes/<datastore>/<VMName>/vmware.log)

As storage connectivity begins to recover, the VM registers Synthetic MSR (Model-Specific Register) faults, capturing the guest-side Blue Screen of Death.

[YYYY-MM-DDTHH:MM:SS] Wa(03) vcpu-1 - WinBSOD: Synthetic MSR[0x#######0] 0xef
[YYYY-MM-DDTHH:MM:SS] Wa(03)+ vcpu-1 -
[YYYY-MM-DDTHH:MM:SS] Wa(03) vcpu-1 - WinBSOD: Synthetic MSR[0x#######1] 0xffffb5096e266080
[YYYY-MM-DDTHH:MM:SS] Wa(03)+ vcpu-1 -
[YYYY-MM-DDTHH:MM:SS] Wa(03) vcpu-1 - WinBSOD: Synthetic MSR[0x#######2] 0x0
[YYYY-MM-DDTHH:MM:SS] Wa(03)+ vcpu-1 -
[YYYY-MM-DDTHH:MM:SS] Wa(03) vcpu-1 - WinBSOD: Synthetic MSR[0x#######3] 0x0
[YYYY-MM-DDTHH:MM:SS] Wa(03)+ vcpu-1 -
[YYYY-MM-DDTHH:MM:SS] Wa(03) vcpu-1 - WinBSOD: Synthetic MSR[0x#######4] 0x0

Note: Subsequent reboots of the VM does not reproduce the guest OS crash

Cause

This issue is caused by underlying storage latency or a complete loss of connectivity (APD/PDL). When the Guest OS (Windows) cannot perform critical I/O operations within its hardcoded timeout period, it triggers a kernel-level stop error (BSOD). The hypervisor records this as an MSR fault when the I/O path is partially restored or when the VM attempts to handle the exception.

Resolution

Ensure the underlying storage is restored before attempting to power on/rebooting VMs.

Recommendation

  • Review the health of the physical storage path, including HBAs, switches, and backend storage configuration.
  • Engage your hardware/storage vendor to validate the physical connectivity, firmware/driver compatibility, and overall I/O path stability.