NMI IPI: Panic requested by another PCPU. PC [address], SP [address] (Src [value], CPU[number])on the ESXi host, /var/log/vmkernel.log entries:
MCA: 202: CE Poll G0 B8 [status codes] Memory Controller Read Error on Channel [X]WARNING: Heartbeat: [ID]: PCPU [number] didn't have a heartbeat for [X] seconds, timeout is 10, [X] IPIs sent; *may* be locked up.Three-stage pattern (distinguishing characteristic):
ESXi: 7.0, 8.0.3
TCP: 5.0.1, 5.1
vmkernel.log. The message format is `MCA: 202: CE Poll` followed by `Memory Controller Read Error on Channel [X]`.PCPU [number] didn't have a heartbeat` warnings.VmMemPfLockLargePage and related functions fail. The system encounters errors accessing the degraded memory regions.NMI IPI: Panic requested by another PCPU` is a protective response. It prevents potential data corruption by stopping all system activity when memory reliability is compromised.Review vmkernel.log for Memory Errors
Review the /var/run/log/vmkernel.log file
Identify corrected memory error entries with format: MCA: 202: CE Poll G0 B8 [status] Memory Controller Read Error on Channel [X]
Identify the Failing Memory Channel
Determine which memory channel is reporting errors by examining the channel number in error messages
Note if all errors occur consistently on the same channel
Assess Error Frequency
Count the frequency of corrected errors
Critical threshold: If you observe 10 or more CE errors within a few minutes on the same channel, this indicates imminent memory module failure
Gather the following for your hardware vendor:
vmkernel.log file showing corrected memory error messages
Memory channel number identified in step 2
VMkernel crash dump file (vmkernel-zdump) if available
PSOD screen text or screenshot showing the backtrace
Contact Hardware Vendor
Contact your hardware vendor support with the diagnostic information collected in the previous step
Request memory diagnostics and replacement of the failing memory module on the identified channel
Replace Failing Memory Module
Follow your hardware vendor's guidance to identify the specific DIMM slot corresponding to the reported memory channel
Replace the failing memory module
Post-Replacement Monitoring
/var/log/vmkernel.log for 48-72 hoursFor more information about interpreting MCA (Machine Check Architecture) error messages in ESXi logs, see Decoding Machine Check Error (MCE) output after an ESXi panic (Purple Screen).
This article addresses ESXi hosts that experience corrected memory errors followed by NMI IPI panic. The backtrace includes VmMemPf functions. For related scenarios, see:
This issue is distinguished by multiple corrected memory errors immediately preceding the panic. The backtrace also contains VmMemPfLockLargePage functions. Other NMI IPI panic scenarios have different root causes:
HeapVSIAddChunkInfo, J6_NewOnDiskTxn). Caused by 2GB UNMAP requests. No memory errors present.FastSlabAllocSlow during memory allocation. No corrected errors present.VmMemPfCompressed functions related to memory compression during vMotion. No hardware errors present.BitVector operations during vSphere Replication. No memory errors present.VmMemPfLockLargePage or VmMemPfRangeSetBackedByLPage functionsvmkernel.log before the panicIf these conditions are not met, refer to the articles above for other NMI IPI panic scenarios.