An ESXi host experiences a Purple Screen of Death (PSOD) resulting in an immediate system crash and service interruption. The crash occurs when a Physical CPU (PCPU) receives a panic request from another CPU via a Non-Maskable Interrupt (NMI).
The system displays a PSOD message similar to: @BlueScreen: NMI IPI: Panic requested by another PCPU. PC 0x################, SP 0x################ (Src 0x4, CPU0)
VMware vSphere ESXi: 8.0.x
The PSOD is caused by an interrupt storm originating from the Integrated Lights-Out (iLO) management processor. The vmkernel thread becomes unresponsive (stuck) due to the overwhelming volume of interrupts from the iLO hardware. This prevents the affected CPU from responding to heartbeat checks, eventually leading another CPU to trigger a system-wide panic.
The backtrace shows the CPU was processing ilo_isr (iLO Interrupt Service Routine) at the time of the crash:
Backtrace for Saved/Locked CPU - 0:
0x################:[0x################]ehci_filter@(vmkusb)#<None>+0x26
0x################:[0x################]####@example.com#1+0xf
0x################:[0x################]IntrCookie_DoInterrupt@vmkernel#nover+0x5a1
0x################:[0x################]IntrCookie_VmkernelInterrupt@vmkernel#nover+0x38
0x################:[0x################]IDT_IntrHandler@vmkernel#nover+0x97
0x################:[0x################]gate_entry@vmkernel#nover+0xa7
0x################:[0x################]vmk_CharDevWakePollers@vmkernel#nover+0x70
0x################:[0x################]ilo_isr@(ilo)#<None>+0x98
0x################:[0x################]IntrCookieBH@vmkernel#nover+0x170
To resolve this issue, perform the following steps:
Contact Hardware Vendor Support to investigate potential hardware-level failures or firmware bugs associated with the iLO controller and the Physical CPU.
Check for conflicts between the installed iLO driver version and other management software (e.g., Agentless Management Service) that may be contributing to the interrupt storm.
Ensure the server is running the latest supported recipes for Firmware and Drivers as recommended by the vendor.
Run extensive hardware stress tests and diagnostics on the Physical CPUs and the motherboard to rule out intermittent electrical or thermal issues.
For related hardware advisories, refer to: