VMkernel might shut down virtual machines due to a vCPU timer issue
search cancel

VMkernel might shut down virtual machines due to a vCPU timer issue

book

Article ID: 335063

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

In vmkernel.log, you see messages such as:

YYYY-MM-DDTHH:MM:SS5Z cpu68:1001449770)ALERT: Heartbeat: HandleLockup:827: PCPU 8 didn't have a heartbeat for 5 seconds, timeout is 14, 1 IPIs sent; *may* be locked up.
YYYY-MM-DDTHH:MM:SS5Z cpu8:1001449713)WARNING: World: vm 1001449713: PanicWork:8430: vmm3:VM_NAME:vcpu-3:Received VMkernel NMI IPI, possible CPU lockup while executing HV VT VM

The issue is due to a rare race condition in vCPU timers. Because the race is per-vCPU, larger VMs are more exposed to the issue.

 

Environment

VMware vSphere ESXi 7.0.0

Cause

In rare occasions, the VMkernel might consider a virtual machine unresponsive, because it fails to send PCPU heartbeats properly, and shut the VM down.

Resolution

This issue is resolved in 7.0 U3D, Refer vsphere-vcenter-server-70u3d-release-notes.pdf

Workaround:

Disable PCPU heartbeat by running below command on ESXI host :

vsish -e set /reliability/heartbeat/status 0


Note:
This change is non-persistent. Disabling the PCPU heartbeat setting is effective only until the next reboot.