ESXi halts with a purple diagnostic screen(PSOD) referencing Machine Check Exception (MCE)
search cancel

ESXi halts with a purple diagnostic screen(PSOD) referencing Machine Check Exception (MCE)

book

Article ID: 372284

calendar_today

Updated On: 04-04-2025

Products

VMware vSphere ESXi

Issue/Introduction

  • Symptoms:

    • When a purple diagnostic screen(PSOD) occurs on an ESXI host you may see a reference to "Machine Check Exception".
    • At the ESXi console, the purple diagnostic screen will have entries similar to:

      VMware ESXi 7.X.X [Releasebuild-22348816 x86_64]
      Machine Check Exception on PCPUXX in world 111111:idle51
      System has encountered a Hardware Error - Please contact the hardware vendor

      Uncorrectable/recoverable memory error in world XXXX; unable to recover in kernel context
      Data Cache DataRead Error
    • In the /var/run/log/vmkernel.log, you may see entries similar to:
      YYYY-MM-DDTHH:MM:SS.114Z cpu58:40848027)ALERT: MCA: 200: SRAR Excp G7 B1 XXXXXX Cache Hierarchy: Level 0 Data Cache DataRead Error.
      YYYY-MM-DDTHH:MM:SS.114Z cpu58:40848027)MCAIntel: 1120: Force retiring MPN XXXXX to recover from MCA error detected by cpu58 in bank1.
      YYYY-MM-DDTHH:MM:SS.252Z cpu56:40848027)ALERT: MCA: 200: SRAR Excp G7 B1 XXXXXXX Cache Hierarchy: Level 0 Data Cache DataRead Error.
      YYYY-MM-DDTHH:MM:SS.252Z cpu56:40848027)MCAIntel: 1120: Force retiring MPN XXXXXX to recover from MCA error detected by cpu56 in bank1.
    • The error can also caused by a failing hardware device. In such case PSOD screen may report error similar to 
      YYYY-MM-DDTHH:MM:SS.114Z cpu58:40848027)IDT: 1895: Uncorrectable/unrecoverable machine check error
      YYYY-MM-DDTHH:MM:SS.114Z cpu58:40848027)MCA: 208: UC Excp G4 86 Sbb00002000000e0b AB M180008 P8/8 I/O error reported by PCI 0000:00:03.0.

Cause

Machine check architecture is a mechanism within the CPU to detect and report hardware problems.

When hardware encounters a critical/fatal error, a machine check exception (MCE) is raised. As the machine check exceptions are considered fatal and unrecoverable, ESXi Server is expected to crash with a PSOD.

Resolution

  • A reboot of the server can help to restore the server status in case of transient errors.
    • Please take a screenshot of the the server console before restarting the server. 
  • Engage hardware vendor for a detailed investigation on the source of MCE error.

Additional Information