Understanding and troubleshooting vMotion
search cancel

Understanding and troubleshooting vMotion

book

Article ID: 321009

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

This article provides information on troubleshooting the VMware vMotion process.

Symptoms:

  1.  vMotion stuck.
  2. vMotion fails at 10%
  3. vMotion times out
  4. In vCenter Server, the following errors are present:
    • Migration will cause the virtual machine's configuration to be modified to preserve the CPU feature requirements for its guest operating system.
      Operation timed out
    • A general system error occurred:

      Failed waiting for data. Error 16. Invalid argument
       
    • A general system error occurred: failed to look up vMotion destination resource pool object



Environment

VMware vSphere ESXi

Resolution

Introduction

When performing a live migration of a virtual machine from one ESXi host to another, vMotion consist of these steps:

vMotion request is sent to the vCenter Server

During this stage, a call is sent to vCenter Server requesting the live migration of a virtual machine to another host. This call may be issued through the VMware vSphere Web Client, VMware vSphere Client or through an API call.

vCenter Server sends the vMotion request to the destination ESXi host

During this stage, a request is sent to the destination ESXi host by vCenter Server to notify the host for an incoming vMotion. This step also validates if the host can receive a vMotion. If a vMotion is allowed on the host, the host replies to the request allowing the vMotion to continue. If the host is not configured for vMotion, the host replies to the request disallowing the vMotion, resulting in a vMotion failure.
 
Common issues:

vCenter Server computes the specifications of the virtual machine to migrate

During this stage, details of the virtual machine are queried to notify the source, and destination hosts of the vMotion task details. This may include any Fault Tolerance settings, disk size, vMotion IP address streams, and the source and destination virtual machine configuration file locations.
 

vCenter Server sends the vMotion request to the source ESXi host to prepare the virtual machine for migration

During this stage, a request is made to the source ESXi host by vCenter Server to notify the host for an incoming vMotion. This step validates if the host can send a vMotion. If a vMotion is allowed on the host, the host replies to the request allowing the vMotion to continue. If the host is not configured for vMotion, the host will reply to the request disallowing the vMotion, resulting in a vMotion failure.

Once the vMotion task has been validated, the configuration file for the virtual machine is placed into read-only mode and closed with a 90 second protection timer. This prevents changes to the virtual machine while the vMotion task is in progress.
 
Common issue during this stage are:

vCenter Server initiates the destination host virtual machine

During this stage, the destination host creates, registers and powers on a new virtual machine. The virtual machine is powered on to a state that allows the virtual machine to consume resources and prepares it to receive the virtual machine state from the source host. During this time a world ID is generated that is sent to the source host as the target virtual machine for the vMotion.
 
Common issues:

vCenter Server initiates the source host virtual machine

During this stage, the source host begins to migrate the memory and running state of the source virtual machine to the destination virtual machine. This information is transferred using VMkernel ports configured for vMotion. Additional resources are allocated for the destination virtual machine and additional helper worlds are created. The memory of the source virtual machine is transferred using checkpoints.
 
After the memory and virtual machine state is completed, a stun of the source virtual machine occurs to copy any remaining changes that occurred during the last checkpoint copy. Once this is complete the destination virtual machine resume as the primary machine for the virtual machine that is being migrated.

In cases where vMotion memory pre-copy cannot converge, it leads to vMotion failure. To prevent this, SDPS (Stun During Page Send) intentionally stuns (slows down) the vCPUs to keep the virtual machine's memory modification rate below the vMotion network transmit rate to force memory pre-copy convergence. The duration of a stun is in the order of 10's of micro secs. SDPS comprises of many such small stuns to vCPUs spaced across time in a fashion that the impact of the stun is not noticeable by the Guest Operating System or its application in most cases. But there are cases where total amount of SDPS stun time can lead to the guest operating system or application running slower than expected. It may also affect the time on the guest OS.
 
Common issues:

vCenter Server switches the virtual machine's ESXi host from the source to destination

During this stage, the virtual machine is running on the destination ESXi host. vCenter Server will update to reflect this change by changing the virtual machines host and pool to the destination host. The source virtual machine is powered down and unregistered from the source host. Any resources that were in use are released back to the host.

vCenter Server completes the vMotion task

During this stage, the vMotion task is marked as complete.
 
Common issues during this stage are:



Additional Information

VMware Skyline Health Diagnostics for vSphere - FAQ
vMotion fails with network errors
Testing network connectivity with the ping command
Restarting the Management agents in ESXi
Investigating disk space on an ESX or ESXi host
Identifying Fibre Channel, iSCSI, and NFS storage issues on ESX/ESXi hosts
Testing VMkernel network connectivity with the vmkping command
Identifying issues with and setting up name resolution on ESX/ESXi Server
Verifying time synchronization across an ESX/ESXi host environment
VMware vMotion fails if target host does not meet reservation requirements
vMotion fails at 82% with the hostd log error: "Source detected that destination failed to resume"
vMotion fails at 90% with the error: A general system error occurred: failed to resume on destination message
VMotion fails on a NFS datastore
vMotion fails with the error: Migration to host <> failed with error
vMotion fails at 10% with the error: Operation timed out