NMI IPI: Panic requested by another PCPU. PC [address], SP [address] (Src [value], CPU[number])on the ESXi host, /var/log/vmkernel.log entries:
MCA: 202: CE Poll G0 B8 [status codes] Memory Controller Read Error on Channel [X]WARNING: Heartbeat: [ID]: PCPU [number] didn't have a heartbeat for [X] seconds, timeout is 10, [X] IPIs sent; *may* be locked up.Three-stage pattern (distinguishing characteristic):
ESXi 7.0 or newer
The system's memory module (DIMM) is experiencing hardware degradation. Physical memory must reliably store and retrieve data without errors. When a memory module begins to fail, the memory controller detects read/write errors.
The memory controller uses ECC (Error Correcting Code) to automatically correct these errors. These corrections are logged as CE (Corrected Error) events in vmkernel.log. The message format is `MCA: 202: CE Poll` followed by `Memory Controller Read Error on Channel [X]`.
Individual corrected errors are handled transparently. However, a high rate of corrections indicates the memory module can no longer maintain data integrity. Dozens of errors occurring within seconds show the module is failing.
When corrected errors occur rapidly, the CPU spends significant time performing error correction operations. This causes CPU processing delays. Normal heartbeat monitoring cannot complete within expected timeframes. The system logs `PCPU [number] didn't have a heartbeat` warnings.
During this memory instability, the VMkernel attempts memory management operations. Specifically, large page backing operations handled by VmMemPfLockLargePage and related functions fail. The system encounters errors accessing the degraded memory regions.
The system cannot safely continue operation with unstable memory during critical tasks. It starts an NMI panic to halt operations and preserve diagnostic information. This panic with message `NMI IPI: Panic requested by another PCPU` is a protective response. It prevents potential data corruption by stopping all system activity when memory reliability is compromised.
1. Review vmkernel.log for Memory Errors
/var/run/log/vmkernel.log fileMCA: 202: CE Poll G0 B8 [status] Memory Controller Read Error on Channel [X]2. Identify the Failing Memory Channel
Determine which memory channel is reporting errors by examining the channel number in error messages
Note if all errors occur consistently on the same channel
3. Assess Error Frequency
Count the frequency of corrected errors
Critical threshold: If you observe 10 or more CE errors within a few minutes on the same channel, this indicates imminent memory module failure
Gather the following for your hardware vendor:
vmkernel.log file showing corrected memory error messages
Memory channel number identified in step 2
VMkernel crash dump file (vmkernel-zdump) if available
PSOD screen text or screenshot showing the backtrace
5. Contact Hardware Vendor
Contact your hardware vendor support with the diagnostic information collected in step 4
Request memory diagnostics and replacement of the failing memory module on the identified channel
6. Replace Failing Memory Module
Follow your hardware vendor's guidance to identify the specific DIMM slot corresponding to the reported memory channel
Replace the failing memory module
7. Post-Replacement Monitoring
/var/log/vmkernel.log for 48-72 hoursUnderstanding Memory Errors
For more information about interpreting MCA (Machine Check Architecture) error messages in ESXi logs, see Decoding Machine Check Error (MCE) output after an ESXi panic (Purple Screen).
Related Memory Error Articles
This article addresses ESXi hosts that experience corrected memory errors followed by NMI IPI panic. The backtrace includes VmMemPf functions. For related scenarios, see:
Differentiating NMI IPI Panic Scenarios
This issue is distinguished by multiple corrected memory errors immediately preceding the panic. The backtrace also contains VmMemPfLockLargePage functions. Other NMI IPI panic scenarios have different root causes:
VMFS-related panics:
HeapVSIAddChunkInfo, J6_NewOnDiskTxn). Caused by 2GB UNMAP requests. No memory errors present.Memory allocation panics:
FastSlabAllocSlow during memory allocation. No corrected errors present.vMotion-related panics:
VmMemPfCompressed functions related to memory compression during vMotion. No hardware errors present.Replication-related panics:
BitVector operations during vSphere Replication. No memory errors present.When to use this article:
VmMemPfLockLargePage or VmMemPfRangeSetBackedByLPage functionsvmkernel.log before the panicIf these conditions are not met, refer to the articles above for other NMI IPI panic scenarios.