VMs unreachable and Host disconnects with "Lost access to volume" errors due to Memory Controller faults
search cancel

VMs unreachable and Host disconnects with "Lost access to volume" errors due to Memory Controller faults

book

Article ID: 425516

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Users may experience the following symptoms:

  • Multiple Virtual Machines become unreachable, slow to respond, or lose network connectivity.

  • The ESXi host temporarily disconnects from vCenter or becomes sluggish.

  • In /var/log/vmkernel.log or vCenter Events, you see frequent "Lost access to volume" messages, even though the storage array is healthy.

  • vmkernel.log contains repeated "MCA" or "Memory Controller" errors similar to:

    MCA: 202: CE Poll ... Memory Controller Read Error on Channel 0
    MCA: 202: CE Poll ... Memory Controller Write Error on Channel 0

Environment

 

  • VMware vSphere ESXi 7.x, 8.x, 9.x

  • Hardware platforms (Dell, HPE, Cisco, etc.)

 

Cause

The ESXi host is experiencing a hardware failure in the memory subsystem (DIMM or Motherboard Memory Controller). The system is flooding the CPU with "Correctable Error" (CE) interrupts to handle the faulty memory.

This flood of high-priority hardware interrupts starves the host of CPU cycles required for standard operations—specifically, the Storage Heartbeats. When the host cannot process heartbeats in time, it assumes the storage is down and logs "Lost access to volume," causing VMs to hang or disconnect.

Resolution

This is a hardware fault. To resolve the issue:

  1. Put the host in Maintenance Mode.

  2. Export the vmkernel.log logs noting the specific "Channel" mentioned in the MCA error (e.g., Channel 0).

  3. Contact your hardware vendor (OEM) to run diagnostics and replace the faulty DIMM or Motherboard.

Additional Information

Contributing Articles