ESXi host becomes unstable and fails to enter Maintenance Mode due to memory hardware failure i.e. "Correctable ECC logging limit reached"
search cancel

ESXi host becomes unstable and fails to enter Maintenance Mode due to memory hardware failure i.e. "Correctable ECC logging limit reached"

book

Article ID: 434881

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms: 

  • The ESXi host becomes highly unstable and shows unusually high CPU usage (e.g., up to 200%).

10:14:04am up 56 days 2:42, 2379 worlds, 11 VMs, 41 vCPUs; CPU load average: 2.03, 2.02, 2.05

  • Overall vSAN cluster health shows degradation.

  • The host is unable to complete management tasks. Actions such as vMotion, VM power-off, or entering Maintenance Mode take an exceptionally long time or become permanently stuck 

  • In the host's hardware logs (SEL / IPMI), you see memory-related hardware assertions:

Record Id: 185
When: 2026-03-16T19:55:42
Event Type: 111 (Unknown)
SEL Type: 2 (System Event)
Message: Assert + Memory Correctable ECC logging limit reached 

  • In the /var/run/log/vmkwarning.log or vmkernel.log, you observe admission check failures and resource exhaustion:

 

WARNING: Sched: vm <ID>: 6387: could not create container group, status: Admission check failed for memory resource WARNING: MemSchedAdmit: 1263: Group envoy: Requested memory limit 0 KB insufficient to support effective reservation 740 KB WARNING: UserParam: 1548: sh: could not change group to <host/vim/vmvisor/backup.sh>: Admission check failed for memory resource

 

  • In the /var/run/log/vmkernel.log, vSAN communication drops are recorded:

    DOM: DOMOwner_SetLivenessState:11608: Object <UUID> lost liveness WARNING: HBX: 3729: '<UUID>': HB at offset 4059136 - Reclaiming timed out HB failed

Environment

VMware vSphere ESXi 8.x 
VMware vSphere ESX 9.x
VMware vSAN 8.x 
VMware vSAN 9.x

Cause

A physical memory hardware failure that has reached its correctable ECC error threshold, which subsequently triggers kernel-level memory and CPU exhaustion.

Resolution

To resolve this issue, the impacted host must be isolated and the faulty hardware replaced. Because the host cannot process standard vMotion tasks due to memory admission failures, manual intervention is required to clear the VM locks.

  • Evacuate Virtual Machines by unregistering VM from impacted Host.
  • Place the Host in Maintenance Mode Transitioning the host to Maintenance Mode isolates the faulty memory from the cluster.
  • Perform Hardware Replacement