VM went Offline during a DRS-initiated vMotion event, where the VM was migrated from one ESXi host to another.
search cancel

VM went Offline during a DRS-initiated vMotion event, where the VM was migrated from one ESXi host to another.

book

Article ID: 427587

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

In environments where DRS is enabled, a virtual machine may experience brief stun time during a DRS-initiated vMotion, even though the migration itself completes successfully at the ESXi level.

From the /vmfs/volumes/vmfolder/vmware.log we can see the below

The vMotion workflow was initiated successfully, as shown by the VM entering the migration state machine:

YYYY-MM-DDT00:00:00.000Z In(05) vmx - MigratePlatformInitMigration: init migration data, is_source: 0
YYYY-MM-DDT00:00:00.000Z In(05) vmx - MigrateSetState: Transitioning from state MIGRATE_FROM_VMX_INIT (8) to MIGRATE_FROM_VMX_WAITING (9)
YYYY-MM-DDT00:00:00.000Z In(05) vmx - MigrateSetState: Transitioning from state MIGRATE_FROM_VMX_WAITING (9) to MIGRATE_FROM_VMX_PRECOPY (10)


The VM then completed the memory pre-copy phase and entered the checkpoint and restore phase, during which a brief stun and suspend/resume operation occurs:

YYYY-MM-DDT00:00:00.000Z In(05) vmx - MigrateWaitForData: Waited for 29.93 seconds
YYYY-MM-DDT00:00:00.000Z In(05) vmx - MigrateSetState: Transitioning from state MIGRATE_FROM_VMX_PRECOPY (10) to MIGRATE_FROM_VMX_CHECKPT (11)
YYYY-MM-DDT00:00:00.000Z In(05) vmx - DUMPER: Restoring checkpoint version 8


Finally, the migration completed successfully with a success status code:

YYYY-MM-DDT00:00:00.000Z In(05) vmx - MigrateSetStateFinished: type=2 new state=MIGRATE_FROM_VMX_FINISHED
YYYY-MM-DDT00:00:00.000Z In(05) vcpu-0 - Migration took 72007 micro secs to complete restore with result 0x0, migration status complete

These log entries confirm that:

The DRS-initiated vMotion started successfully.

The VM progressed through pre-copy and checkpoint (stun) phases.

The migration completed successfully.

Environment

VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Cause

When a virtual machine is migrated by DRS using vMotion, the ESXi host performs a brief stun and fast suspend/resume operation as part of the checkpoint and restore phase, which is required to safely transfer the running workload between physical hosts.

While this stun interval is typically very short and transparent to most Guest Operating Systems and applications, certain highly sensitive workloads may experience application interruptions, crashes, or persistent performance impact if they are unable to tolerate brief pauses in CPU or network processing during the migration window.

Resolution

  1. Exclude sensitive virtual machines from drs or use affinity rules to pin the affected vm’s to specific ESXi hosts in order to prevent them from being migrating.
  2. Contact the Guest OS or Application vendor for fine tuning.