ESXi Host experiences PSOD with NMI IPI: Panic requested by another PCPU with interrupt storm originating from the Integrated Lights-Out (iLO)
search cancel

ESXi Host experiences PSOD with NMI IPI: Panic requested by another PCPU with interrupt storm originating from the Integrated Lights-Out (iLO)

book

Article ID: 429117

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

An ESXi host experiences a Purple Screen of Death (PSOD) resulting in an immediate system crash and service interruption. The crash occurs when a Physical CPU (PCPU) receives a panic request from another CPU via a Non-Maskable Interrupt (NMI).

The system displays a PSOD message similar to: @BlueScreen: NMI IPI: Panic requested by another PCPU. PC 0x################, SP 0x################ (Src 0x4, CPU0)

Environment

VMware vSphere ESXi: 8.0.x

Cause

The PSOD is caused by an interrupt storm originating from the Integrated Lights-Out (iLO) management processor. The vmkernel thread becomes unresponsive (stuck) due to the overwhelming volume of interrupts from the iLO hardware. This prevents the affected CPU from responding to heartbeat checks, eventually leading another CPU to trigger a system-wide panic.

The backtrace shows the CPU was processing ilo_isr (iLO Interrupt Service Routine) at the time of the crash:

Backtrace for Saved/Locked CPU - 0:

    0x################:[0x################]ehci_filter@(vmkusb)#<None>+0x26

    0x################:[0x################]####@example.com#1+0xf

    0x################:[0x################]IntrCookie_DoInterrupt@vmkernel#nover+0x5a1

    0x################:[0x################]IntrCookie_VmkernelInterrupt@vmkernel#nover+0x38

    0x################:[0x################]IDT_IntrHandler@vmkernel#nover+0x97

    0x################:[0x################]gate_entry@vmkernel#nover+0xa7

    0x################:[0x################]vmk_CharDevWakePollers@vmkernel#nover+0x70

    0x################:[0x################]ilo_isr@(ilo)#<None>+0x98

    0x################:[0x################]IntrCookieBH@vmkernel#nover+0x170

Resolution

To resolve this issue, perform the following steps:

  1. Contact Hardware Vendor Support to investigate potential hardware-level failures or firmware bugs associated with the iLO controller and the Physical CPU.

  2. Check for conflicts between the installed iLO driver version and other management software (e.g., Agentless Management Service) that may be contributing to the interrupt storm.

  3. Ensure the server is running the latest supported recipes for Firmware and Drivers as recommended by the vendor.

  4.  Run extensive hardware stress tests and diagnostics on the Physical CPUs and the motherboard to rule out intermittent electrical or thermal issues.

Additional Information

For related hardware advisories, refer to:

[HPE Customer Advisory]