Symptoms:
In vmkernel.log
, you see messages such as:YYYY-MM-DDTHH:MM:SS5Z cpu68:1001449770)ALERT: Heartbeat: HandleLockup:827: PCPU 8 didn't have a heartbeat for 5 seconds, timeout is 14, 1 IPIs sent; *may* be locked up.
YYYY-MM-DDTHH:MM:SS5Z cpu8:1001449713)WARNING: World: vm 1001449713: PanicWork:8430: vmm3:VM_NAME:vcpu-3:Received VMkernel NMI IPI, possible CPU lockup while executing HV VT VM
The issue is due to a rare race condition in vCPU timers. Because the race is per-vCPU, larger VMs are more exposed to the issue.
VMware vSphere ESXi 7.0.0
In rare occasions, the VMkernel might consider a virtual machine unresponsive, because it fails to send PCPU heartbeats properly, and shut the VM down.
This issue is resolved in 7.0 U3D, Refer vsphere-vcenter-server-70u3d-release-notes.pdf
Workaround:
Disable PCPU heartbeat by running below command on ESXI host :
vsish -e set /reliability/heartbeat/status 0
Note:
This change is non-persistent. Disabling the PCPU heartbeat setting is effective only until the next reboot.