Virtual machines shut down and reboot unexpectedly.

Products

VMware vSphere ESX 7.x

Issue/Introduction

Symptoms:

Virtual machines unexpectedly shutdown/reboot. VM's guest sends an timeout with hard reset in "/vmfs/volumes/DS1/VM1/VM1.vmx". In the "/vmfs/volumes/DS1/VM1/vmware.log" we see the following:

2026-01-18T09:39:19.451Z In(05) vmx - GuestRpcSendTimedOut: message to toolbox timed out.
2026-01-18T09:39:22.504Z In(05) vmx - Tools: [AppStatus] Last heartbeat value 10164700 (last received 20s ago)
2026-01-18T09:39:22.504Z In(05) vmx - TOOLS: appName=toolbox, oldStatus=1, status=2, guestInitiated=0.
2026-01-18T09:39:30.762Z In(05) vcpu-5 - Chipset: The guest has requested that the virtual machine be hard reset.

In "/var/run/log/hostd.log" we may see that Virtual machines respond with timeout and heartbeat status is changed from green

2026-01-18T09:39:14.239Z In(166) Hostd[2099354]: [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 78993 : Issue detected on ESX# in ha-datacenter:
NMI: 738: NMI IPI: PC 0x42########, SP 0x4######## (Src 0x1, CPU42)
2026-01-18T09:39:21.479Z In(166) Hostd[2099341]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/DS#/VM1/VM1.vmx] Setting heartbeat to yellow; Heartbeat (in 43s): expected=43 (yellow<=80%, red<=40%), actual=23 (53.4884%)
2026-01-18T09:39:22.504Z Db(167) Hostd[2099341]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/DS#/VM1/VM1.vmx] Updating current heartbeatStatus: green -> yellow
2026-01-18T09:39:22.504Z Db(167) Hostd[2099356]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/DS#/VM1/VM1.vmx. (#######-####-####-####-############)/VM1(#######-####-####-####-############).vmx] Updating current heartbeatStatus: green -> yellow
2026-01-18T09:39:27.700Z In(166) Hostd[2099358]: [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 78996 : Issue detected on ESX# in ha-datacenter: NMI: 738: NMI IPI: PC 0x42########, SP 0x45############ (Src 0x1, CPU10)
2026-01-18T09:39:27.700Z In(166) Hostd[2099322]: --> (2026-01-18T09:39:12.242Z cpu10:2097661)
2026-01-18T09:39:30.750Z Wa(164) Hostd[2099372]: [Originator@6876 sub=Default] Node 0: poll(fd=41) timeout with no reply.

In "/var/run/log/hostd.log" we may see lost access to volumes and heartbeat the timeouts.

2026-01-18T09:39:30.751Z In(166) Hostd[2099352]: [Originator@6876 sub=Hostsvc.VmkVprobSource] VmkVprobSource::Post event: (vim.event.EventEx) {= 'vim.Datastore:#######-####-####-####-############'
2026-01-18T09:39:30.751Z In(166) Hostd[2099322]: --> },
2026-01-18T09:39:30.751Z In(166) Hostd[2099322]: --> eventTypeId = "esx.problem.vmfs.heartbeat.timedout",
2026-01-18T09:39:30.751Z In(166) Hostd[2099322]: --> name = "DS1",

2026-01-18T09:39:30.751Z In(166) Hostd[2099367]: [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 78997 : Issue detected on ESX#in ha-datacenter: NMI: 738: NMI IPI: PC 0x42###########, SP 0x45########## (Src 0x4, CPU68)

2026-01-18T09:39:30.759Z In(166) Hostd[2099351]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/DS1/VM1/VM1.vmx] Setting heartbeat to red; Heartbeat (in 48s): expected=48 (yellow<=80%, red<=40%), actual=19 (39.5833%)
2026-01-18T09:39:30.759Z Db(167) Hostd[2099351]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/DS1/VM1/VM1.vmx] Updating current heartbeatStatus: green -> red
2026-01-18T09:39:30.763Z In(166) Hostd[2099370]: [Originator@6876 sub=Hostsvc.VmkVprobSource] VmkVprobSource::Post event: (vim.event.EventEx) {
2026-01-18T09:39:30.764Z In(166) Hostd[2099322]: --> name = "DS#",
2026-01-18T09:39:30.764Z In(166) Hostd[2099322]: --> datastore = 'vim.Datastore:#######-####-####-####-############'
2026-01-18T09:39:30.764Z In(166) Hostd[2099322]: --> },
2026-01-18T09:39:30.764Z In(166) Hostd[2099322]: --> eventTypeId = "esx.problem.vmfs.heartbeat.timedout",
2026-01-18T09:39:30.764Z In(166) Hostd[2099322]: --> name = "DS#",

Storage connectivity and virtual machine failures are observed in /var/run/log/vmkernel.log
YYYY-MM-DD THH:MM:SS Lost access to volume <volume UUID> due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
YYYY-MM-DD THH:MM:SS [msg.hbacommon.locklost] The lock protecting '<vm-name>.vmdk' has been lost, possibly due to underlying storage issues. PANIC: Exiting because of failed disk operation.

ESXi host may also crash with a PSOD.

Environment

VMware vSphere ESXi 7.x

VMware vSphere ESXi 8.x

Cause

Memory controller failure may cause severe storage I/O exceptions, leading directly to an ESXi host instability and causing its virtual machines to shut down and reboot unexpectedly.

Memory controller failures and pcpu heartbeat timeouts which are reported on "/var/run/log/vmkernel.log"

VMs were forced into a Hard Reset because the underlying hardware was unable to provide the necessary CPU cycles and memory access required to maintain the Guest OS state, leading to a watchdog timeout or a guest-side kernel panic.

2026-01-18T09:39:09.079Z In(182) vmkernel: cpu45:3448987)MCA: 202: CE Poll G0 B7 S9c00##### Aab#### M2000#### Pab#####/40 Memory Controller Read Error on Channel 0.
2026-01-18T09:39:09.079Z Wa(180) vmkwarning: cpu9:2851033)WARNING: Heartbeat: 961: PCPU 15 didn't have a heartbeat for 5 seconds, timeout is 10, 1 IPIs sent; *may* be locked up.
2026-01-18T09:39:09.079Z In(182) vmkernel: cpu9:2851033)Heartbeat: 1014: Sending timer IPI to PCPU 15
2026-01-18T09:39:09.079Z Al(177) vmkalert: cpu15:2097650)ALERT: NMI: 738: NMI IPI: PC 0x4200#####, SP 0x45##### (Src 0x1, CPU15)
2026-01-18T09:39:09.079Z In(182) vmkernel: cpu15:2097650)0x4539cf91bd80:[0x42001bc84046]MemSchedPolicy_GroupUpdateTargets@vmkernel#nover+0x4f stack: 0x43018ec01220
2026-01-18T09:39:09.079Z In(182) vmkernel: cpu15:2097650)0x4539cf91bdd0:[0x42001bc925bb]SchedTreeTopDownTraversalInt@vmkernel#nover+0x2c stack: 0x43018ec01960
2026-01-18T09:39:09.079Z In(182) vmkernel: cpu15:2097650)0x4539cf91be10:[0x42001bc92616]SchedTreeTopDownTraversalInt@vmkernel#nover+0x87 stack: 0x41ffdbcb04a8
2026-01-18T09:39:09.079Z In(182) vmkernel: cpu15:2097650)0x4539cf91be50:[0x42001bc85a0e]MemSchedReallocReallocate@vmkernel#nover+0x2db stack: 0x100000008
2026-01-18T09:39:09.079Z In(182) vmkernel: cpu15:2097650)0x4539cf91bfc0:[0x42001bc871f8]MemSchedReallocLoop@vmkernel#nover+0x41 stack: 0x4539cd39f100
2026-01-18T09:39:09.079Z In(182) vmkernel: cpu15:2097650)0x4539cf91bfe0:[0x42001bedc88e]CpuSched_StartWorld@vmkernel#nover+0xbf stack: 0x0
2026-01-18T09:39:09.079Z In(182) vmkernel: cpu15:2097650)0x4539cf91c000:[0x42001b944faf]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
2026-01-18T09:39:10.183Z Al(177) vmkalert: cpu69:2097721)ALERT: NMI: 738: NMI IPI: PC 0x42001b96df77, SP 0x4539d1c9bef0 (Src 0x1, CPU69)
2026-01-18T09:39:10.183Z In(182) vmkernel: cpu69:2097721)0x4539d1c9bef0:[0x42001b96df76]MCSLockSpin@vmkernel#nover+0x47 stack: 0x41ffdbcb0c88
2026-01-18T09:39:10.183Z In(182) vmkernel: cpu69:2097721)0x4539d1c9bf20:[0x42001b96e5c1]MCSLockWait@vmkernel#nover+0x14a stack: 0x0
2026-01-18T09:39:10.183Z In(182) vmkernel: cpu69:2097721)0x4539d1c9bf40:[0x42001b96eb6d]MCSLockWork@vmkernel#nover+0x2a stack: 0x0

Resolution

Engage hardware vendor to further investigate the memory issues.

Additional Information

About MCE errors:

ESXi Host Becomes Unresponsive Due to Memory Controller Errors Leading to Storage I/O Issues