PSOD: NMI IPI: Panic requested by another PCPU. PC 0XXXXX, SP 0XXXXX (Src 0XX, CPUXX)
search cancel

PSOD: NMI IPI: Panic requested by another PCPU. PC 0XXXXX, SP 0XXXXX (Src 0XX, CPUXX)

book

Article ID: 404233

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • Crash at YYYY-MM-DDTXX:XX:XX on CPU ## running world #### - numasched. VMK Uptime:XXXXX
  • PSOD with following Backtrace:

@BlueScreen: NMI IPI: Panic requested by another PCPU. PC 0x42002156dcb7, SP 0x45399769bc80 (Src 0x2, CPU39)
YYYY-MM-DDTXX:XX:XX cpuxx:25135316)0x4539f791bf40:[0x420021a98105]SyscallUWVMK64@vmkernel#nover+0xxx
YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0x45399769bc80:[0x42002156dcb6]MCSLockSpin@vmkernel#nover+0xxx
YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0x45399769bcb0:[0x42002156e301]MCSLockWait@vmkernel#nover+0xxx
YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0x45399769bcd0:[0x42002156e813]MCSLockIRQWork@vmkernel#nover+0xxx
YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0x45399769bcf0:[0x420021887a1c]NUMASchedSnapshotNodes@vmkernel#nover+0xxx
YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0x45399769be20:[0x4200218878b0]NUMASchedSnapshotNodes@vmkernel#nover+0xxx
YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0x45399769bf50:[0x420021889284]NUMASchedLoop@vmkernel#nover+0xxx
YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0x45399769bfe0:[0x420021ad67b2]CpuSched_StartWorld@vmkernel#nover+0xxx
YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0x45399769c000:[0x420021544cef]Debug_IsInitialized@vmkernel#nover+0xxx

  • vmkernel logs may report below pattern,

    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0xxxxxxx:[0xxxxxxx]MCSLockIRQWork@vmkernel#nover+0x40 stack: 0x4539ec01f100
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0xxxxxxx:[0xxxxxxx]NUMASchedSnapshotNodes@vmkernel#nover+0x1c5 stack: 0x925c47
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0xxxxxxx:[0xxxxxxx]NUMASchedSnapshotNodes@vmkernel#nover+0x59 stack: 0x103
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0xxxxxxx:[0xxxxxxx]NUMASchedLoop@vmkernel#nover+0x2f9 stack: 0x2
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0xxxxxxx:[0xxxxxxx]CpuSched_StartWorld@vmkernel#nover+0xbf stack: 0x0
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0xxxxxxx:[0xxxxxxx]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)base fs=0x0 gs=0xxxxxxxc Kgs=0x0
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)1 other PCPU is in panic.
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)NMI: 738: NMI IPI: PC 0xxxxxxx, SP 0xxxxxxx (Src 0xx, cpuxx)
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)NMI: 738: NMI IPI: PC 0xxxxxxx, SP 0xxxxxxx (Src 0xx, cpuxx)
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)NMI: 738: NMI IPI: PC 0xxxxxxx, SP 0xxxxxxx (Src 0xx, cpuxx)
    YYYY-MM-DDTXX:XX:XX cpuxx:2097898)NMI: 738: NMI IPI: PC 0xxxxxxx, SP 0xxxxxxx (Src 0xx, CPU44)
    YYYY-MM-DDTXX:XX:XX cpuxx:25135316)NMI: 738: NMI IPI: PC 0xxxxxxx, SP 0xxxxxxx (Src 0xx, cpuxx)
    YYYY-MM-DDTXX:XX:XX cpuxx:25136450)NMI: 738: NMI IPI: PC 0xxxxxxx, SP 0xxxxxxx (Src 0xx, cpuxx)
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)NMI: 738: NMI IPI: PC 0xxxxxxx, SP 0xxxxxxx (Src 0xx, cpuxx)
    YYYY-MM-DDTXX:XX:XX cpuxx:2101689)NMI: 738: NMI IPI: PC 0xxxxxxx, SP 0xxxxxxx (Src 0xx, cpuxx)
    YYYY-MM-DDTXX:XX:XX cpuxx:2100159)NMI: 738: NMI IPI: PC 0xxxxxxx, SP 0xxxxxxx (Src 0xx, cpuxx)
    YYYY-MM-DDTXX:XX:XX cpuxx:25605071)NMI: 738: NMI IPI: PC 0xxxxxxx, SP 0xxxxxxx (Src 0xx, cpuxx)
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)Backtrace for current CPU #XX, worldID=XXXXX, fp=0x0
    .....
    ......
    @BlueScreen: NMI IPI: Panic requested by another PCPU. PC 0x42002156dcb7, SP 0x45399769bc80 (Src 0x2, CPU39)
    YYYY-MM-DDTXX:XX:XX cpuxx:25135316)0xxxxxxx:[0xxxxxxx]SyscallUWVMK64@vmkernel#nover+0xxx
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0xxxxxxx:[0xxxxxxx]MCSLockSpin@vmkernel#nover+0xxx
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0xxxxxxx:[0xxxxxxx]MCSLockWait@vmkernel#nover+0xxx
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0xxxxxxx:[0xxxxxxx]MCSLockIRQWork@vmkernel#nover+0xxx
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0xxxxxxx:[0xxxxxxx]NUMASchedSnapshotNodes@vmkernel#nover+0xxx
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0xxxxxxx:[0xxxxxxx]NUMASchedSnapshotNodes@vmkernel#nover+0xxx
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0xxxxxxx:[0xxxxxxx]NUMASchedLoop@vmkernel#nover+0xxx
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0xxxxxxx:[0xxxxxxx]CpuSched_StartWorld@vmkernel#nover+0xxx
    YYYY-MM-DDTXX:XX:XX cpuxx:2097901)0xxxxxxx:[0xxxxxxx]Debug_IsInitialized@vmkernel#nover+0xxx

Environment

  • VMware vSphere ESXi 7.X
  • VMware vSphere ESXI 8.x

Cause

This issue's origin may lie with any hardware component. For instance, in a prior case, the root cause was traced to a specific CPU.
A CPU became unresponsive, causing a cascading failure and triggering a system-wide panic due to dependent CPUs awaiting its response.

Resolution

Reach out to the hardware vendor.
It is imperative that they conduct a detailed memory diagnostics scan specifically looking for Memory Correctable Errors (MCEs) or other anomalies with reported CPU.