VMkernel might shut down virtual machines due to a vCPU timer issue
search cancel

VMkernel might shut down virtual machines due to a vCPU timer issue

book

Article ID: 335063

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

In the vmkernel.log, you see messages such as:

2021-05-28T21:39:59.895Z cpu68:1001449770)ALERT: Heartbeat: HandleLockup:827: PCPU 8 didn't have a heartbeat for 5 seconds, timeout is 14, 1 IPIs sent; *may* be locked up.
2021-05-28T21:39:59.895Z cpu8:1001449713)WARNING: World: vm 1001449713: PanicWork:8430: vmm3:VM_NAME:vcpu-3:Received VMkernel NMI IPI, possible CPU lockup while executing HV VT VM


The issue is due to a rare race condition in vCPU timers. Because the race is per-vCPU, larger VMs are more exposed to the issue.


Environment

VMware vSphere ESXi 7.0.0

Cause

In rare occasions, the VMkernel might consider a virtual machine unresponsive, because it fails to send PCPU heartbeats properly, and shut the VM down.

Resolution

This issue is resolved in 7.0 U3D.

Refer: VMware ESXi 7.0 Update 3d Release Notes


Workaround:

Disable PCPU heartbeat by using the command:
vsish -e set /reliability/heartbeat/status 0


Note:
This change is non-persistent.  Disabling the PCPU heartbeat setting is effective only until the next reboot.