Virtual machine stun or brief hang when performing Hot Add of CPU or memory resources

search cancel

Virtual machine stun or brief hang when performing Hot Add of CPU or memory resources

book

Article ID: 432796

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

When increasing virtual machine (VM) compute resources (CPU or RAM) while the guest is in a powered-on state (Hot Add), the VM may experience a prolonged "stun" period or become temporarily unresponsive.

In some environments, this stun can last several minutes, during which time the application or guest OS may appear to hang.

In the /vmfs/volumes/<Datastore>/<VM-Folder>/vmware.log, shows the VM entering a "stun" state simultaneously with the resource modification, followed by an "Checkpoint_Unstun" event once the operation completes.

YYYY-MM-DDThh:mm:ss.073Z In(05) vmx - memoryHotplug: Change of memory size from 32768 to 65536 was requested YYYY-MM-DDThh:mm:ss.551Z In(05) vmx - Migrate: VM starting stun, waiting 100 seconds for go/no-go message. YYYY-MM-DDThh:mm:ss.626Z In(05) vcpu-0 - Migrate: VM successfully stunned YYYY-MM-DDThh:mm:ss.835Z Wa(03) vmx - VMX has left the building: 0. YYYY-MM-DDThh:mm:ss.536Z In(05) vcpu-0 - Checkpoint_Unstun: vm stopped for 18500 us

Cause

This behavior is by design. Whenever compute resources are hot added, ESXi initiates an internal migration task on the same host to reconfigure the VM's hardware backing.

During this process:

The VM transfers its memory state to a new configuration instance.
The VMkernel must "close" the virtual disks on the source instance so they can be "opened" by the destination instance.
If the disk has a high volume of pending I/O commands, the VMkernel may take a significant amount of time to quiesce and close the disks.
The VM is held in a "stun" state throughout this closing period to ensure data integrity, resulting in the observed hang.

Resolution

This is an expected behavior as the Hot-add functionality is performing as intended. To reduce the impact of VM stuns during resource modification, consider the following:

Perform resource changes during maintenance windows: Avoid Hot Add operations during periods of peak I/O or high application load to minimize the time required for the VMkernel to close disks.
Storage Performance: Ensure the underlying storage latency is optimized, as faster disk acknowledgment reduces the time the VM remains in the Checkpoint_Stun phase.

Additional Information

Feedback

thumb_up Yes

thumb_down No