PSOD: NMI: 710: NMI IPI: RIPOFF(base):RBP:CS [0x105483(0x420002200000):0x45d9c81ba8c0:0xf48] (Src 0x1, CPU120)
search cancel

PSOD: NMI: 710: NMI IPI: RIPOFF(base):RBP:CS [0x105483(0x420002200000):0x45d9c81ba8c0:0xf48] (Src 0x1, CPU120)

book

Article ID: 376511

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESXi 8.0 VMware vSphere ESXi 7.0

Issue/Introduction

ESXi 7.0 or 8.0 experienced a PSOD.
In the vmkernel log the backtrace is similar to :

[YYYY-MM-DDTHH:MM:SS] cpu70:2097222)WARNING: Heartbeat: 827: PCPU 120 didn't have a heartbeat for 7 seconds, timeout is 14, 1 IPIs sent; *may* be locked up.
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)ALERT: NMI: 710: NMI IPI: RIPOFF(base):RBP:CS [0x105483(0x420002200000):0x45d9c81ba8c0:0xf48] (Src 0x1, CPU120)
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819bd48:[0x420002305482]MCSUnlockWork@vmkernel#nover+0x2b stack: 0x45d9c81ba750
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819bd50:[0x420002670693]SCSIHandleRmwReadComplete@vmkernel#nover+0x4e0 stack: 0x1dc31a
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819bdf0:[0x42000266f396]SCSICompletePathCommand@vmkernel#nover+0x287 stack: 0x0
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819beb0:[0x42000265b0fe]SCSICompleteAdapterCommand@vmkernel#nover+0x46b stack: 0x0
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819bf60:[0x4200030972bb]ioComp_wld@(lsi_mr3)#<None>+0x5c stack: 0x431162c03900
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819bf90:[0x42000231dfd0]vmkWorldFunc@vmkernel#nover+0x49 stack: 0x42000231dfcc
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819bfe0:[0x4200025b33e1]CpuSched_StartWorld@vmkernel#nover+0x86 stack: 0x0
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819c000:[0x4200022c4b4f]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)WARNING: Lock: 1658: (held by 120: Spin count exceeded 1 time(s) - possible deadlock.
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339ad60:[0x42000221b945]Lock_CheckSpinCount@vmkernel#nover+0x23d stack: 0x0
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339adb0:[0x420002304838]MCSLockSpin@vmkernel#nover+0x69 stack: 0x43097c2314c8
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339ade0:[0x420002304e6e]MCSLockWait@vmkernel#nover+0x153 stack: 0x43097c176440
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339ae00:[0x4200023053bd]MCSLockWork@vmkernel#nover+0x2a stack: 0x431162e8c000
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339ae10:[0x42000266dbc4]SCSIStartPathCommands@vmkernel#nover+0x4b9 stack: 0x43097c231ca0
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339af40:[0x42000267378a]SCSIIssueAsyncPathCommandDirect@vmkernel#nover+0x2ab stack: 0x43097c2bcd00
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339b040:[0x4200026756ff]vmk_ScsiIssueAsyncPathCommandDirect@vmkernel#nover+0x1c stack: 0x45d9c80ac3c0
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339b060:[0x420002fedb43]HppIssueCommand@(hpp)#<None>+0x108 stack: 0x45396f81f000
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339b0f0:[0x420002fee0cd]HppDeviceStartCommandProcessing@(hpp)#<None>+0x18a stack: 0x42005f800080
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339b560:[0x420002608d3e]ScsiPluginDeviceStartCommand@vmkernel#nover+0x4f stack: 0x28
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339b5a0:[0x42000262d922]SCSIStartDeviceInt@vmkernel#nover+0x37 stack: 0x45d9c80abf88
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339b5c0:[0x420002634b4a]SCSIDirectSubmitOrQueueCommand@vmkernel#nover+0x19f stack: 0xaaaaaaaaaaab
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339b630:[0x4200026353b8]SCSIAsyncDeviceCommand@vmkernel#nover+0x295 stack: 0x109
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339b700:[0x42000261ea7a]PsaScsi_AsyncIO@vmkernel#nover+0x43 stack: 0x43097c32ad10
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339b740:[0x42000224c3b6]FDS_AsyncIO@vmkernel#nover+0x6a3 stack: 0x45d9919e5a58
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339b7e0:[0x4200033eae0c]Fil6_HandleReadIO@esx#nover+0x135 stack: 0x100000
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339b840:[0x4200033edb7b]Fil6_FileIOInt@esx#nover+0xe9c stack: 0x43139e1401f0
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339b9b0:[0x4200033eef50]Fil6_FileIOIntWithRetry@esx#nover+0x55 stack: 0x1
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339ba50:[0x4200033ef2ab]Fil6_FileIO@esx#nover+0x138 stack: 0x453994c1f000
[YYYY-MM-DDTHH:MM:SS] cpu187:2105893)0x45399339bb50:[0x42000223b5a7]FSSVec_FileIO@vmkernel#nover+0x20 stack: 0x430bdf360550
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)ALERT: NMI: 710: NMI IPI: RIPOFF(base):RBP:CS [0x105483(0x420002200000):0x45d9c81ba8c0:0xf48] (Src 0x4, CPU120)
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819bd48:[0x420002305482]MCSUnlockWork@vmkernel#nover+0x2b stack: 0x45d9c81ba750
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819bd50:[0x420002670693]SCSIHandleRmwReadComplete@vmkernel#nover+0x4e0 stack: 0x1dc31a
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819bdf0:[0x42000266f396]SCSICompletePathCommand@vmkernel#nover+0x287 stack: 0x0
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819beb0:[0x42000265b0fe]SCSICompleteAdapterCommand@vmkernel#nover+0x46b stack: 0x0
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819bf60:[0x4200030972bb]ioComp_wld@(lsi_mr3)#<None>+0x5c stack: 0x431162c03900
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819bf90:[0x42000231dfd0]vmkWorldFunc@vmkernel#nover+0x49 stack: 0x42000231dfcc
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819bfe0:[0x4200025b33e1]CpuSched_StartWorld@vmkernel#nover+0x86 stack: 0x0
[YYYY-MM-DDTHH:MM:SS] cpu120:2099575)0x45398819c000:[0x4200022c4b4f]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0

Cause

In PSA RMW code, the write operation is submitted as part of read completion.
When submitting the write request, if we have other requests that we service in issue path or there is a delay due to lock contention, the write operation can be completed before the controller returns to the issuer and here when we access the frame that could have got freed or reused, we can hit into PSOD.
The fix is to not reuse the IO frame as the IO can complete before the control returns back to the user.

Resolution

There is no workaround.

This issue is fixed in the following version of ESXi:

Version Release Name Release Date Build Number
ESXi 7.0 P09 ESXi 7.0 Update 3q 2024/05/21 23794027
ESXi 8.0 U2 P03 ESXi 8.0 Update 2b 2024/02/29 23305546