ESXi may experience PSOD during high number of concurrent vMotions (e.g. during entering Maintenance Mode for a Host running many VMs).
Backtrace may look similar to the following:
<YYYY-MM-DD>T<time>Z cpu78:2098897)Panic: 589: Panic from another CPU (cpu 78, world 2098897): ip=0x420002796b20 randomOff=0x2400000:
PCPU 118: no heartbeat (3/3 IPIs received)
<YYYY-MM-DD>T<time>Z cpu78:2098897)Panic: 767: Saved backtrace: pcpu 118 Heartbeat NMI
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891bda8:[0x420002512369]Timer_GetCycles@vmkernel#nover+0x2 stack: 0x95d6ead3470ec8, 0x431c228173a8, 0x431c228173ac, 0x0, 0x41ffd4814b00
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891bdb0:[0x42000250913c]SP_WaitReadLock@vmkernel#nover+0x9d stack: 0x431c228173a8, 0x431c228173ac, 0x0, 0x41ffd4814b00, 0x0
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891bdf0:[0x4200025091f6]SPAcqWriteLockWork@vmkernel#nover+0x33 stack: 0x41ffd480b760, 0x420003b6e915, 0x431c228123a0, 0x420003bca136, 0x45cbd479ed00
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891be10:[0x420003b6e914]rw_wlock@(tcpip4)#<None>+0x8d stack: 0x45cbd479ed00, 0x431c23640545, 0x45dbdcfde900, 0x45cbd479ed02, 0x232f42c0
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891be20:[0x420003bca135]tcp_input@(tcpip4)#<None>+0x1f6 stack: 0x45dbdcfde900, 0x45cbd479ed02, 0x232f42c0, 0x45dbdcfde980, 0x0
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891bf60:[0x420003b70703]netisr_check@(tcpip4)#<None>+0x14c stack: 0x431c228123a0, 0x100000002, 0x1, 0x431c22801dd0, 0x1
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891bfc0:[0x420003b5bed7]TcpipIsrCheckWorld@(tcpip4)#<None>+0x58 stack: 0x453ba779f140, 0x4200027b4d56, 0x0, 0x0, 0x0
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891bfe0:[0x4200027b4d55]CpuSched_StartWorld@vmkernel#nover+0x86 stack: 0x0, 0x4200024c4de0, 0x0, 0x0, 0x0
<YYYY-MM-DD>T<time>Z cpu78:2098897)pcpu 118 Heartbeat NMI: 0x453bb891c000:[0x4200024c4ddf]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0, 0x0, 0x0, 0x0, 0x0
<YYYY-MM-DD>T<time>Z cpu78:2098897)Panic: 727: Halting PCPU 78.
<YYYY-MM-DD>T<time>Z cpu118:2099147)Panic: 767: Saved backtrace: pcpu 118 Heartbeat NMI
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891bda8:[0x420002512369]Timer_GetCycles@vmkernel#nover+0x2 stack: 0x95d6ead3470ec8, 0x431c228173a8, 0x431c228173ac, 0x0, 0x41ffd4814b00
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891bdb0:[0x42000250913c]SP_WaitReadLock@vmkernel#nover+0x9d stack: 0x431c228173a8, 0x431c228173ac, 0x0, 0x41ffd4814b00, 0x0
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891bdf0:[0x4200025091f6]SPAcqWriteLockWork@vmkernel#nover+0x33 stack: 0x41ffd480b760, 0x420003b6e915, 0x431c228123a0, 0x420003bca136, 0x45cbd479ed00
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891be10:[0x420003b6e914]rw_wlock@(tcpip4)#<None>+0x8d stack: 0x45cbd479ed00, 0x431c23640545, 0x45dbdcfde900, 0x45cbd479ed02, 0x232f42c0
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891be20:[0x420003bca135]tcp_input@(tcpip4)#<None>+0x1f6 stack: 0x45dbdcfde900, 0x45cbd479ed02, 0x232f42c0, 0x45dbdcfde980, 0x0
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891bf60:[0x420003b70703]netisr_check@(tcpip4)#<None>+0x14c stack: 0x431c228123a0, 0x100000002, 0x1, 0x431c22801dd0, 0x1
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891bfc0:[0x420003b5bed7]TcpipIsrCheckWorld@(tcpip4)#<None>+0x58 stack: 0x453ba779f140, 0x4200027b4d56, 0x0, 0x0, 0x0
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891bfe0:[0x4200027b4d55]CpuSched_StartWorld@vmkernel#nover+0x86 stack: 0x0, 0x4200024c4de0, 0x0, 0x0, 0x0
<YYYY-MM-DD>T<time>Z cpu118:2099147)pcpu 118 Heartbeat NMI: 0x453bb891c000:[0x4200024c4ddf]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0, 0x0, 0x0, 0x0, 0x0
<YYYY-MM-DD>T<time>Z cpu118:2099147)VMware ESXi 7.0.3 [Releasebuild-23794027 x86_64]
or
<YYYY-MM-DD>T<time>Z In(14) LogEFI[2098642]: Panic from another CPU (cpu 70, world 10353420): ip=0x420022e59e33 randomOff=0x22a00000:PCPU 0: no heartbeat (3/3 IPIs received)Halting PCPU 70.2025-01-08T20:22:10.757Z cpu0:2098501)ESC[45mESC[33;1mVMware ESXi 8.0.3 [Releasebuild-24280767 x86_64]ESC[0m
<YYYY-MM-DD>T<time>Z In(14) LogEFI[2098642]: NMI IPI: Panic requested by another PCPU. PC 0x420022b85402, SP 0x4539e511bba0 (Src 0x1, CPU0)
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)cr0=0x8001003d cr2=0x7fe732c209a0 cr3=0x307000 cr4=0x14216c
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)FMS=06/8f/8 uCode=0x2b0005c0
<YYYY-MM-DD>T<time>Z In(14) LogEFI[2098642]: *PCPU0:2098501/vmk0-rx-1
<YYYY-MM-DD>T<time>Z In(14) LogEFI[2098642]: PCPU 0: SSSSSSVSSSSSUSSSVSSSSSVSSSVSVUSSSVSSSSSSUUSSSSSSSSSSSSUSSSSSUSSS
<YYYY-MM-DD>T<time>Z In(14) LogEFI[2098642]: PCPU 64: UUSSSSSSSSISSSSSSVSSSSSSSSSVSSSSSSSSVSUVSSSSVUSSSUSSSSSSSSSUSSSS
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)Code start: 0x420022a00000 VMK uptime: 109:14:50:01.958
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)Saved backtrace from: pcpu 0 Heartbeat NMI
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bba0:[0x420022b85401]SP_WaitReadLock@vmkernel#nover+0xbe stack: 0x431b3e237b48
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bbe0:[0x420022b8548a]SPAcqWriteLockWork@vmkernel#nover+0x33 stack: 0x41ffd5014cc0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bc00:[0x4200240307ad]rw_wlock@(tcpip4)#<None>+0x4e stack: 0x420040000000
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bc10:[0x42002409337e]tcp_input@(tcpip4)#<None>+0x253 stack: 0x100000004
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bd40:[0x420024084db5]ip_input@(tcpip4)#<None>+0x12a stack: 0x431b3e2439e0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bdc0:[0x420024032662]netisr_dispatch@(tcpip4)#<None>+0x17 stack: 0x100000000
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bdd0:[0x42002405ff85]ether_demux@(tcpip4)#<None>+0x206 stack: 0x41ffd500b860
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511be00:[0x420024060208]ether_input@(tcpip4)#<None>+0x1e5 stack: 0x1
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511be40:[0x42002402be01]if_vmk_rx@(tcpip4)#<None>+0x16e stack: 0x0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bec0:[0x42002410834f]TcpipRxFastPath@(tcpip4)#<None>+0x78 stack: 0x4539e511bf80
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bef0:[0x4200241085fe]TcpipRx@(tcpip4)#<None>+0x14b stack: 0x0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bf60:[0x4200241045f4]TcpipDispatchWorld@(tcpip4)#<None>+0x38d stack: 0x0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511bfe0:[0x4200230d67b2]CpuSched_StartWorld@vmkernel#nover+0xbf stack: 0x0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511c000:[0x420022b44cef]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)0x4539e511c000:[0x420022b44cef]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
<YYYY-MM-DD>T<time>Z In(14) LogEFI: cpu0:2098501)base fs=0x0 gs=0x420040000000 Kgs=0x0
ESXi 7.0.3
ESXi 8.0.3
The issue is caused by high lock contention (tcbinfo lock) in tcpip stack during high network activity.
Engineering is implementing a fix for the issue in future release.
Workarounds:
To Decrease number of maximum current vmotions on a host to 8 at a time add the following advanced setting in vCenter:
config.vpxd.ResourceManager.costPerVmotionESX6x = 8