ESXi PSOD (purple screen of death) Panic from another CPU (tcbinfo lock)
search cancel

ESXi PSOD (purple screen of death) Panic from another CPU (tcbinfo lock)

book

Article ID: 385615

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

ESXi may experience PSOD during high number of concurrent vMotions (e.g. during entering Maintenance Mode for a Host running many VMs).

Backtrace may look similar to the following:

<YYYY-MM-DD>T<time>Z cpu78:2098897)Panic: 589: Panic from another CPU (cpu 78, world 2098897): ip=0x420002796b20 randomOff=0x2400000:
PCPU 118: no heartbeat (3/3 IPIs received)
<YYYY-MM-DD>T<time>Z cpu78:2098897)Panic: 767: Saved backtrace: pcpu 118 Heartbeat NMI
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891bda8:[0x420002512369]Timer_GetCycles@vmkernel#nover+0x2 stack: 0x95d6ead3470ec8, 0x431c228173a8, 0x431c228173ac, 0x0, 0x41ffd4814b00
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891bdb0:[0x42000250913c]SP_WaitReadLock@vmkernel#nover+0x9d stack: 0x431c228173a8, 0x431c228173ac, 0x0, 0x41ffd4814b00, 0x0
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891bdf0:[0x4200025091f6]SPAcqWriteLockWork@vmkernel#nover+0x33 stack: 0x41ffd480b760, 0x420003b6e915, 0x431c228123a0, 0x420003bca136, 0x45cbd479ed00
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891be10:[0x420003b6e914]rw_wlock@(tcpip4)#<None>+0x8d stack: 0x45cbd479ed00, 0x431c23640545, 0x45dbdcfde900, 0x45cbd479ed02, 0x232f42c0
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891be20:[0x420003bca135]tcp_input@(tcpip4)#<None>+0x1f6 stack: 0x45dbdcfde900, 0x45cbd479ed02, 0x232f42c0, 0x45dbdcfde980, 0x0
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891bf60:[0x420003b70703]netisr_check@(tcpip4)#<None>+0x14c stack: 0x431c228123a0, 0x100000002, 0x1, 0x431c22801dd0, 0x1
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891bfc0:[0x420003b5bed7]TcpipIsrCheckWorld@(tcpip4)#<None>+0x58 stack: 0x453ba779f140, 0x4200027b4d56, 0x0, 0x0, 0x0
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891bfe0:[0x4200027b4d55]CpuSched_StartWorld@vmkernel#nover+0x86 stack: 0x0, 0x4200024c4de0, 0x0, 0x0, 0x0
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891c000:[0x4200024c4ddf]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0, 0x0, 0x0, 0x0, 0x0
<YYYY-MM-DD>T<time>Z cpu78:2098897)Panic: 727: Halting PCPU 78.
<YYYY-MM-DD>T<time>Z cpu118:2099147)Panic: 767: Saved backtrace: pcpu 118 Heartbeat NMI
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891bda8:[0x420002512369]Timer_GetCycles@vmkernel#nover+0x2 stack: 0x95d6ead3470ec8, 0x431c228173a8, 0x431c228173ac, 0x0, 0x41ffd4814b00
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891bdb0:[0x42000250913c]SP_WaitReadLock@vmkernel#nover+0x9d stack: 0x431c228173a8, 0x431c228173ac, 0x0, 0x41ffd4814b00, 0x0
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891bdf0:[0x4200025091f6]SPAcqWriteLockWork@vmkernel#nover+0x33 stack: 0x41ffd480b760, 0x420003b6e915, 0x431c228123a0, 0x420003bca136, 0x45cbd479ed00
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891be10:[0x420003b6e914]rw_wlock@(tcpip4)#<None>+0x8d stack: 0x45cbd479ed00, 0x431c23640545, 0x45dbdcfde900, 0x45cbd479ed02, 0x232f42c0
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891be20:[0x420003bca135]tcp_input@(tcpip4)#<None>+0x1f6 stack: 0x45dbdcfde900, 0x45cbd479ed02, 0x232f42c0, 0x45dbdcfde980, 0x0
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891bf60:[0x420003b70703]netisr_check@(tcpip4)#<None>+0x14c stack: 0x431c228123a0, 0x100000002, 0x1, 0x431c22801dd0, 0x1
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891bfc0:[0x420003b5bed7]TcpipIsrCheckWorld@(tcpip4)#<None>+0x58 stack: 0x453ba779f140, 0x4200027b4d56, 0x0, 0x0, 0x0
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891bfe0:[0x4200027b4d55]CpuSched_StartWorld@vmkernel#nover+0x86 stack: 0x0, 0x4200024c4de0, 0x0, 0x0, 0x0
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891c000:[0x4200024c4ddf]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0, 0x0, 0x0, 0x0, 0x0
<YYYY-MM-DD>T<time>Z cpu118:2099147)VMware ESXi 7.0.3 [Releasebuild-23794027 x86_64]

or

<YYYY-MM-DD>T<time>Z In(14) LogEFI[2098642]: Panic from another CPU (cpu 70, world 10353420): ip=0x420022e59e33 randomOff=0x22a00000:PCPU 0: no heartbeat (3/3 IPIs received)Halting PCPU 70.2025-01-08T20:22:10.757Z cpu0:2098501)ESC[45mESC[33;1mVMware ESXi 8.0.3 [Releasebuild-24280767 x86_64]ESC[0m
<YYYY-MM-DD>T<time>Z In(14) LogEFI[2098642]: NMI IPI: Panic requested by another PCPU. PC 0x420022b85402, SP 0x4539e511bba0 (Src 0x1, CPU0)
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)cr0=0x8001003d cr2=0x7fe732c209a0 cr3=0x307000 cr4=0x14216c
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)FMS=06/8f/8 uCode=0x2b0005c0
<YYYY-MM-DD>T<time>Z In(14) LogEFI[2098642]: *PCPU0:2098501/vmk0-rx-1
<YYYY-MM-DD>T<time>Z In(14) LogEFI[2098642]: PCPU  0: SSSSSSVSSSSSUSSSVSSSSSVSSSVSVUSSSVSSSSSSUUSSSSSSSSSSSSUSSSSSUSSS
<YYYY-MM-DD>T<time>Z In(14) LogEFI[2098642]: PCPU 64: UUSSSSSSSSISSSSSSVSSSSSSSSSVSSSSSSSSVSUVSSSSVUSSSUSSSSSSSSSUSSSS
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)Code start: 0x420022a00000 VMK uptime: 109:14:50:01.958
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)Saved backtrace from: pcpu 0 Heartbeat NMI
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bba0:[0x420022b85401]SP_WaitReadLock@vmkernel#nover+0xbe stack: 0x431b3e237b48
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bbe0:[0x420022b8548a]SPAcqWriteLockWork@vmkernel#nover+0x33 stack: 0x41ffd5014cc0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bc00:[0x4200240307ad]rw_wlock@(tcpip4)#<None>+0x4e stack: 0x420040000000
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bc10:[0x42002409337e]tcp_input@(tcpip4)#<None>+0x253 stack: 0x100000004
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bd40:[0x420024084db5]ip_input@(tcpip4)#<None>+0x12a stack: 0x431b3e2439e0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bdc0:[0x420024032662]netisr_dispatch@(tcpip4)#<None>+0x17 stack: 0x100000000
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bdd0:[0x42002405ff85]ether_demux@(tcpip4)#<None>+0x206 stack: 0x41ffd500b860
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511be00:[0x420024060208]ether_input@(tcpip4)#<None>+0x1e5 stack: 0x1
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511be40:[0x42002402be01]if_vmk_rx@(tcpip4)#<None>+0x16e stack: 0x0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bec0:[0x42002410834f]TcpipRxFastPath@(tcpip4)#<None>+0x78 stack: 0x4539e511bf80
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bef0:[0x4200241085fe]TcpipRx@(tcpip4)#<None>+0x14b stack: 0x0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bf60:[0x4200241045f4]TcpipDispatchWorld@(tcpip4)#<None>+0x38d stack: 0x0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bfe0:[0x4200230d67b2]CpuSched_StartWorld@vmkernel#nover+0xbf stack: 0x0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511c000:[0x420022b44cef]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511c000:[0x420022b44cef]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)base fs=0x0 gs=0x420040000000 Kgs=0x0

Environment

ESXi 7.0.3
ESXi 8.0.3

Cause

The issue is caused by high lock contention (tcbinfo lock) in tcpip stack during high network activity.

Resolution

Engineering has implemented fixes for this in ESX 9.1. We highly recommend upgrading to this release.

Workarounds (if unable to upgrade):

  • Reduce parallel vMotions.

  • Configure a dedicated vMotion network stack.

  • Limit the host to 8 concurrent vMotions by adding this advanced vCenter setting:

config.vpxd.ResourceManager.costPerVmotionESX6x = 8

Additional Information

vCenter Limits for Concurrent vMotion