ESXi 8.0 U3 Host PSOD with “NMI IPI: Panic requested by another PCPU. PC %#lx, SP %#lx (Src %#x, CPU%u)" due to Race Condition During IO Completion
search cancel

ESXi 8.0 U3 Host PSOD with “NMI IPI: Panic requested by another PCPU. PC %#lx, SP %#lx (Src %#x, CPU%u)" due to Race Condition During IO Completion

book

Article ID: 403884

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

ESXi 8.0 Update 3 hosts may experience a Purple Screen of Death (PSOD) with the error: NMI IPI: Panic requested by another PCPU. PC %#lx, SP %#lx (Src %#x, CPU%u)

Host PSOD with below backtrace:

Panic Details: Crash at 2025-05-25T20:23:41.244Z on CPU 39 running world 2098794 - iscsi-rx-world-39. VMK Uptime:69:15:18:25.600
Panic Message: @BlueScreen: NMI IPI: Panic requested by another PCPU. PC 0x420017f6e8ef, SP 0x453bee81b8e8 (Src 0x2, CPU39)

Backtrace:

  0x452b80272d30:[0x420017f7bbc0]PanicvPanicInt@vmkernel#nover+0x20c stack: 0x3339, 0x420017f7bbc0, 0x0, 0x420000000001, 0x420017f7bbc0
  0x452b80272de0:[0x420017f7c396]Panic_WithBacktrace@vmkernel#nover+0x57 stack: 0x452b80272e50, 0x452b80272e00, 0x453bee81f000, 0x452b80272eaf, 0x420017f6e8ef
  0x452b80272e50:[0x420017f782a1]NMI_Interrupt@vmkernel#nover+0x516 stack: 0x0, 0x0, 0x0, 0x0, 0x0
  0x452b80272f10:[0x4200184a0404]IDTNMIWork@vmkernel#nover+0x95 stack: 0x0, 0x4200184a186d, 0x0, 0x42001849b0c7, 0x750
  0x452b80272f30:[0x4200184a186c]Int2_NMI@vmkernel#nover+0x9 stack: 0x750, 0x750, 0x0, 0x0, 0x45bc88810000
  0x452b80272f40:[0x42001849b0c6]gate_entry@vmkernel#nover+0xa7 stack: 0x0, 0x0, 0x45bc88810000, 0x88810000, 0x0
  0x453bee81b8e8:[0x420017f6e8ef]MCSUnlockIRQWork@vmkernel#nover+0x2b stack: 0x453bee81b8f0, 0x430ce26ca6a8, 0x420019394ef7, 0x8d78, 0x1829c847
  0x453bee81b8f0:[0x4200182706fb]EventQ_WakeupCount@vmkernel#nover+0x24 stack: 0x430ce26ca6a8, 0x420019394ef7, 0x8d78, 0x1829c847, 0x45bc3690c3c0
  0x453bee81b910:[0x420019394ef6]Fil6_NoTxnIOSM@esx#nover+0x9b stack: 0x45bc3690c3c0, 0x420019394fdc, 0x453bee81b940, 0x1, 0x45bc3690c3c0
  0x453bee81b930:[0x420019394fdb]Fil6_IOCompleteNoTxn@esx#nover+0xb8 stack: 0x45bc3690c3c0, 0x45bc00afaa00, 0x1, 0x4200184c1ebf, 0x41ffd82b0f40
  0x453bee81b960:[0x4200184c1ebe]AsyncPopCallbackFrameInt@vmkernel#nover+0xe3 stack: 0x45bc311e24c0, 0x45bc3690c3c0, 0x0, 0x4200182f982f, 0x45bc00000008
  0x453bee81b990:[0x4200182f982e]PsaScsi_AsyncTokenIODone@vmkernel#nover+0xaf stack: 0x3129eb407b2d60, 0x45bc311e2680, 0x430ce26ca680, 0x45bc311e24c0, 0x0
  0x453bee81b9d0:[0x4200182d8273]SCSIDeviceCmdCompleteInt@vmkernel#nover+0xa0 stack: 0x430ce26ca680, 0x80000000, 0x45bc311e2718, 0x420017f84df3, 0x80000000
  0x453bee81ba40:[0x4200182df5c2]SCSIDeviceCmdCompleteCB@vmkernel#nover+0x12db stack: 0x0, 0x42001849b0c7, 0x750, 0x750, 0x0
  0x453bee81bb10:[0x4200182e07d2]SCSICompleteDeviceCommand@vmkernel#nover+0x193 stack: 0x420049c06840, 0x4200184d8b85, 0x453bfc51f100, 0x17f8c1ae, 0x453bfc51f940
  0x453bee81bc40:[0x420018e06c43][email protected]#v2_13_0_0+0x3c stack: 0x432016839370, 0x4200184d502c, 0x453bee81bc60, 0x453bee81bc60, 0x748
  0x453bee81bca0:[0x420018e07177][email protected]#v2_13_0_0+0x8c stack: 0x7fffffffffffffff, 0x1, 0x45dc4ee09870, 0x453bfc51f100, 0x0
  0x453bee81bdb0:[0x420018e82795]psp_rrCommandComplete@(vmw_psp_rr)#<None>+0xda stack: 0x430ce250e9b0, 0x45bc311e28e0, 0x45bc311e2810, 0x430ce264eb00, 0x1
  0x453bee81be30:[0x4200183917c2]SCSICompletePathCommand@vmkernel#nover+0x20b stack: 0x8000000000000006, 0x0, 0x1000000000000, 0x6, 0x0
  0x453bee81bef0:[0x42001837bc90]SCSICompleteAdapterCommand@vmkernel#nover+0x5d5 stack: 0x0, 0x3129eafa70c9d2, 0x3129eb406c1b08, 0x3129eb406c192e, 0x3129eb407b2130
  0x453bee81bfa0:[0x4200191b13d6]iscsivmk_RxWorld@(iscsi_vmk)#<None>+0xa7 stack: 0x453bee81f000, 0x453be619f100, 0x453bee81f100, 0x0, 0x0
  0x453bee81bfe0:[0x4200184d67b2]CpuSched_StartWorld@vmkernel#nover+0xbf stack: 0x0, 0x420017f44cf0, 0x0, 0x0, 0x0
  0x453bee81c000:[0x420017f44cef]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0, 0x0, 0x0, 0x0, 0x0


Before the host PSODs, the following messages are observed in vmkernel.log:

YYYY-MM-DDTHH:MM:SSZ cpu52:2100401)WARNING: Heartbeat: 961: PCPU 39 didn't have a heartbeat for 6 seconds, timeout is 10, 1 IPIs sent; *may* be locked up.
YYYY-MM-DDTHH:MM:SSZ cpu52:2100401)Heartbeat: 1014: Sending timer IPI to PCPU 39
YYYY-MM-DDTHH:MM:SSZ cpu39:2098794)ALERT: NMI: 738: NMI IPI: PC 0x420017f6e8ed, SP 0x453bee81b8e8 (Src 0x1, CPU39)
YYYY-MM-DDTHH:MM:SSZ cpu39:2098794)0x453bee81b8e8:[0x420017f6e8ec]MCSUnlockIRQWork@vmkernel#nover+0x29 stack: 0x453bee81b8f0
YYYY-MM-DDTHH:MM:SSZ cpu39:2098794)0x453bee81b8f0:[0x4200182706fb]EventQ_WakeupCount@vmkernel#nover+0x24 stack: 0x430ce26ca6a8
YYYY-MM-DDTHH:MM:SSZ cpu39:2098794)0x453bee81b910:[0x420019394ef6]Fil6_NoTxnIOSM@esx#nover+0x9b stack: 0x45bc3690c3c0
YYYY-MM-DDTHH:MM:SSZ cpu39:2098794)0x453bee81b930:[0x420019394fdb]Fil6_IOCompleteNoTxn@esx#nover+0xb8 stack: 0x45bc3690c3c0

Environment

ESXi 8.0 Update 3

Cause

The issue is seen because of a rare race scenario between wait and wakeup threads for an IO. 
A spurious wakeup on the waiting thread resulted in freeing of ioCtx while the completion thread was still in progress.

Resolution

Broadcom has acknowledged the issue. A permanent fix is being worked on and is expected to be included in a future release of ESXi.
This Knowledge Base article will be updated once the exact version with the fix is confirmed.

Workaround:
There is no workaround available at this time.

  • The issue is due to a race condition, which by nature is non-deterministic and occurs very rarely.

  • Although no configuration change can completely prevent the issue, the frequency of occurrence is expected to be extremely low.

Additional Information

For regular updates on this issue, please subscribe to this KB or monitor release notes of future ESXi versions.