A Windows-based virtual machine (VM) unexpectedly crashes with a Blue screen of death(BSOD). This typically occurs following a period of storage instability. Reviewing the ESXi logs reveals the following indicators:
1. Storage latency and I/O errors (/var/run/log/hostd.log)
Significant delays in basic file operations (e.g., opening the.vmx file) and subsequent I/O errors when storage becomes unresponsive.[YYYY-MM-DDTHH:MM:SS] warning hostd[2104418] [Originator@6876 sub=IoTracker] In thread 2927571, fopen("/vmfs/volumes/<datastore>/<VMName>/Virtual-Machine.vmx") took over 140 sec.[YYYY-MM-DDTHH:MM:SS] warning hostd[2285231] [Originator@6876 sub=IoTracker] In thread 2102958, fopen("/vmfs/volumes/ec.<datastore>/<VMName>/Virtual-Machine.vmx") took over 76 s..[YYYY-MM-DDTHH:MM:SS] info hostd[2102535] [Originator@6876 sub=Libs opID=########-########-####-####-h5:########-##-##-##-####] DictionaryLoad: Cannot open file "/vmfs/volumes/<datastore>/<VMName>/Virtual-Machine.vmx": Input/output error.[YYYY-MM-DDTHH:MM:SS] info hostd[2102535] [Originator@6876 sub=Libs opID=] VigorOffline_GenSecPolicy: retry reading /vmfs/volumes/########-########-####-####-h5:########-##-##-##-####<datastore>/<VMName>/Virtual-Machine.vmx
2. Heartbeat timeouts and object liveness (/var/run/log/vmkernel.log)
The ESXi kernel reports loss of communication with storage objects. Critical I/O errors are logged when the kernel can no longer access the .vmx or virtual disk files.[YYYY-MM-DDTHH:MM:SS] cpu4:17153319)HBX: 294: '########-####-####-############: HB at offset - Reclaimed heartbeat [Timeout]:####### [YYYY-MM-DDTHH:MM:SS] cpu4:17153319) [HB state abcdef02 offset 3248128 gen 241 stampUS ######## uuid ########-####-####-############ jrnl <FB 796920> drv 14.81 lockImpl 4 ip ##.###.###.###]..[YYYY-MM-DDTHH:MM:SS] cpu71:2098688)DOM: DOMOwner_SetLivenessState:7358: Object ########-####-####-####-############ lost liveness [########][YYYY-MM-DDTHH:MM:SS] cpu66:2098672)DOM: DOMOwner_SetLivenessState:7358: Object ########-####-####-####-############ lost liveness [########]
3. All Paths Down (APD) and connectivity restoration events (/var/run/log/vobd.log)
The storage stack enters an APD state, indicating all communication paths to the storage device or filesystem have been lost and later eventually recovering after the underlying storage connection is restored.
[YYYY-MM-DDTHH:MM:SS] In(14) vobd[2097765]: [APDCorrelator] #######: [esx.problem.storage.apd.start] Device or filesystem with identifier [#######-#######] has entered the All Paths Down state.#######
.
.
[YYYY-MM-DDTHH:MM:SS] In(14) vobd[2097765]: [vmfsCorrelator] #######: [vob.vmfs.nfs.server.restored] Restored connection to the server ###.###.#.### mount point /<DatastoreName>, mounted as #-#-#-# ("<Datastore Name>")#####
4. Guest OS blue screen MSR faults (/vmfs/volummes/<datastore>/<VMName>/vmware.log)
As storage connectivity begins to recover, the VM registers Synthetic MSR (Model-Specific Register) faults, capturing the guest-side Blue Screen of Death.[YYYY-MM-DDTHH:MM:SS] Wa(03) vcpu-1 - WinBSOD: Synthetic MSR[0x#######0] 0xef[YYYY-MM-DDTHH:MM:SS] Wa(03)+ vcpu-1 -[YYYY-MM-DDTHH:MM:SS] Wa(03) vcpu-1 - WinBSOD: Synthetic MSR[0x1] 0xffffb5096e266080#######[YYYY-MM-DDTHH:MM:SS] Wa(03)+ vcpu-1 -[YYYY-MM-DDTHH:MM:SS] Wa(03) vcpu-1 - WinBSOD: Synthetic MSR[0x2] 0x0#######[YYYY-MM-DDTHH:MM:SS] Wa(03)+ vcpu-1 -[YYYY-MM-DDTHH:MM:SS] Wa(03) vcpu-1 - WinBSOD: Synthetic MSR[0x3] 0x0#######[YYYY-MM-DDTHH:MM:SS] Wa(03)+ vcpu-1 -[YYYY-MM-DDTHH:MM:SS] Wa(03) vcpu-1 - WinBSOD: Synthetic MSR[0xNote: Subsequent reboots of the VM does not reproduce the guest OS crash4] 0x0#######
This issue is caused by underlying storage latency or a complete loss of connectivity (APD/PDL). When the Guest OS (Windows) cannot perform critical I/O operations within its hardcoded timeout period, it triggers a kernel-level stop error (BSOD). The hypervisor records this as an MSR fault when the I/O path is partially restored or when the VM attempts to handle the exception.
Ensure the underlying storage is restored before attempting to power on/rebooting VMs.
Recommendation