YYYY-MM-DDTHH:MM:SS.114Z cpu58:40848027)IDT: 1895: Uncorrectable/unrecoverable machine check error
YYYY-MM-DDTHH:MM:SS.114Z cpu58:40848027)MCA: 208: UC Excp G4 86 Sbb00002000000e0b AB M180008 P8/8 I/O error reported by PCI 0000:00:03.0.
7.x
8.x
When hardware encounters a critical/fatal error, a machine check exception (MCE) is raised by CPU. As the machine check exceptions are considered fatal and unrecoverable, ESXi Server is expected to crash with a PSOD.
System event logs (IPMI log) entries which can be retrieved can be retrieved using command esxcli hardware ipmi sel list can help to confirm the cause.
Record:18318:
When: 2025-03-14T10:54:53
Event Type: 4 (Minor)
SEL Type: 2 (System Event)
Message: Assert + Processor Predictive Failure Asserted
Sensor Number: 80
Record:18321:
When: 2025-03-14T10:54:53
Event Type: 4 (Minor)
SEL Type: 2 (System Event)
Message: Assert + Processor Predictive Failure Asserted
Sensor Number: 121
On this sampled hardware Sensor Number 80 is marked as Processor 1 P_CATERR and Sensor Number 121 is marked as Processor 1 IERR. IERR suggests the error was caused by an IO Device connected to the system board.
Note: Sensor number can differ based on hardware vendor/model and BIOS. Please check the Sensor Data Records using command: esxcli hardware ipmi sdr list to map the sensor numbers mentioned in the error record.
Note: Take a screenshot of the the server console before restarting the server.
A reboot of the server can help to restore the server status in case of transient errors.