WARNING: Heartbeat: ###: PCPU ## didn't have a heartbeat for 7 seconds, timeout is 21, 1 IPIs sent; *may* be locked up.
search cancel

WARNING: Heartbeat: ###: PCPU ## didn't have a heartbeat for 7 seconds, timeout is 21, 1 IPIs sent; *may* be locked up.

book

Article ID: 389673

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

You might see that messages in vmkernel logs of the host. This can also cause a PSOD with a backtrace similar to this:

 

Backtrace for current CPU #20, worldID=#####, fp=0x42002b4f4600
0x4529c03e2cf0:[0x420029eff107]PanicvPanicInt@vmkernel#nover+0x327 stack: 0x4529c03e2dc8, 0x4303b6a09088, 0x420029eff107, 0x42002a3f4600, 0x4529c03e2cf0
0x4529c03e2dc0:[0x420029eff6b9]Panic_WithBacktrace@vmkernel#nover+0x56 stack: 0x4529c03e2e30, 0x4529c03e2de0, 0x4529c03e2e40, 0x4529c03e2df0, 0x84814
0x4529c03e2e30:[0x420029efbe0c]NMI_Interrupt@vmkernel#nover+0x561 stack: 0x0, 0xf48, 0x0, 0x0, 0x0
0x4529c03e2f00:[0x420029f53392]IDTNMIWork@vmkernel#nover+0x7f stack: 0x42004f800000, 0x420029f546dd, 0x0, 0x4529c03e2fd0, 0x0
0x4529c03e2f20:[0x420029f546dc]Int2_NMI@vmkernel#nover+0x19 stack: 0x0, 0x420029f4e068, 0xf50, 0xf50, 0x0
0x4529c03e2f40:[0x420029f4e067]gate_entry@vmkernel#nover+0x68 stack: 0x0, 0x0, 0x0, 0x0, 0x1
0x453a01f1bdc8:[0x420029e84814]Power_ArchPerformWait@vmkernel#nover+0x70 stack: 0x42004f800980, 0x0, 0x0, 0x42004f800000, 0x0
0x453a01f1bdd0:[0x420029e84982]Power_ArchSetCState@vmkernel#nover+0x8f stack: 0x0, 0x0, 0x42004f800000, 0x0, 0x42004f800000
0x453a01f1be20:[0x42002a1af368]CpuSchedIdleLoopInt@vmkernel#nover+0x275 stack: 0x42004f800100, 0x202, 0x453a01f1be58, 0x1, 0x453a01f1be60
0x453a01f1be90:[0x42002a1b3e85]CpuSched_IdleLoop@vmkernel#nover+0x16 stack: 0x3e, 0x420029e7772c, 0x0, 0x0, 0x0
0x453a01f1bea0:[0x420029edd8a4]Init_SlaveIdle@vmkernel#nover+0x4d stack: 0x0, 0x0, 0x0, 0x0, 0x0
0x453a01f1beb0:[0x420029e7772b]SMPSlaveIdle@vmkernel#nover+0x26c stack: 0x0, 0x0, 0x0, 0x0, 0x0

Environment

VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Resolution

  • Validate which cpus' that messages are coming from.
  • If it is coming from a single core, switch the core from a good core to the bad one from the same ESXi host.
  • Validate if the same messages are coming from now a different set of cpus
  • If yes, engage the hardware vendor for replacement of hardware