PSOD Triggered by CPU lock - Spin count exceeded - Possible deadlock with PCPU
search cancel

PSOD Triggered by CPU lock - Spin count exceeded - Possible deadlock with PCPU

book

Article ID: 395129

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

A host in the environment experienced a Purple Screen of Death (PSOD). Upon review, the PSOD was caused by a CPU thread becoming unresponsive while holding a lock, and multiple physical CPUs (PCPUs) were also found to be unresponsive to non-maskable interrupts (NMIs).

The host crashed unexpectedly and displayed a PSOD.

Environment

vSphere ESXi 7.X

vSphere ESXi 8.X

Cause

The PSOD was due to hardware-level behavior where a physical CPU was unable to release a lock due to being overwhelmed by platform interrupts. This is symptomatic of a known issue often referred to as an iLO interrupt storm, especially common in AMD EPYC-based servers.

Resolution

This is a hardware-related issue. VMware recommends the following steps:

  1. Contact the server hardware vendor (e.g., HPE) and provide them with full logs and PSOD screenshots or dumps.

  2. Refer the hardware vendor to known issues like:

  3. Discuss firmware/BIOS or iLO updates that may help mitigate interrupt storms or improve CPU interrupt handling.

Additional Information

 

  • Broadcom does not have control over hardware interrupt behavior; this type of issue must be addressed at the firmware or hardware design level.

  • Ensure hosts are running the latest supported BIOS, iLO firmware, and ESXi version validated by the server vendor.