When performing a live migration of a virtual machine from one ESXi host to another, vMotion consist of these steps:
vMotion request is sent to the vCenter Server
During this stage, a call is sent to vCenter Server requesting the live migration of a virtual machine to another host. This call may be issued through the VMware vSphere Web Client, VMware vSphere Client or through an API call.
vCenter Server sends the vMotion request to the destination ESXi host
During this stage, a request is sent to the destination ESXi host by vCenter Server to notify the host for an incoming vMotion. This step also validates if the host can receive a vMotion. If a vMotion is allowed on the host, the host replies to the request allowing the vMotion to continue. If the host is not configured for vMotion, the host replies to the request disallowing the vMotion, resulting in a vMotion failure.
Common issues:
vCenter Server computes the specifications of the virtual machine to migrate
During this stage, details of the virtual machine are queried to notify the source, and destination hosts of the vMotion task details. This may include any Fault Tolerance settings, disk size, vMotion IP address streams, and the source and destination virtual machine configuration file locations.
Common issues during this stage are:
vCenter Server sends the vMotion request to the source ESXi host to prepare the virtual machine for migration
During this stage, a request is made to the source ESXi host by vCenter Server to notify the host for an incoming vMotion. This step validates if the host can send a vMotion. If a vMotion is allowed on the host, the host replies to the request allowing the vMotion to continue. If the host is not configured for vMotion, the host will reply to the request disallowing the vMotion, resulting in a vMotion failure.
Once the vMotion task has been validated, the configuration file for the virtual machine is placed into read-only mode and closed with a 90 second protection timer. This prevents changes to the virtual machine while the vMotion task is in progress.
Common issue during this stage are:
vCenter Server initiates the destination host virtual machine
During this stage, the destination host creates, registers and powers on a new virtual machine. The virtual machine is powered on to a state that allows the virtual machine to consume resources and prepares it to receive the virtual machine state from the source host. During this time a world ID is generated that is sent to the source host as the target virtual machine for the vMotion.
Common issues:
vCenter Server initiates the source host virtual machine
During this stage, the source host begins to migrate the memory and running state of the source virtual machine to the destination virtual machine. This information is transferred using VMkernel ports configured for vMotion. Additional resources are allocated for the destination virtual machine and additional helper worlds are created. The memory of the source virtual machine is transferred using checkpoints.
After the memory and virtual machine state is completed, a stun of the source virtual machine occurs to copy any remaining changes that occurred during the last checkpoint copy. Once this is complete the destination virtual machine resume as the primary machine for the virtual machine that is being migrated.
In cases where vMotion memory pre-copy cannot converge, it leads to vMotion failure. To prevent this, SDPS (Stun During Page Send) intentionally stuns (slows down) the vCPUs to keep the virtual machine's memory modification rate below the vMotion network transmit rate to force memory pre-copy convergence. The duration of a stun is in the order of 10's of micro secs. SDPS comprises of many such small stuns to vCPUs spaced across time in a fashion that the impact of the stun is not noticeable by the Guest Operating System or its application in most cases. But there are cases where total amount of SDPS stun time can lead to the guest operating system or application running slower than expected. It may also affect the time on the guest OS.
Common issues:
- If Jumbo Frames are enabled (MTU of 9000) (9000 -8 bytes (ICMP header) -20 bytes (IP header) for a total of 8972), ensure that vmkping is using the command:
vmkping -d -s 8972 destinationIPaddress
Note: You may experience problems with the trunk between two physical switches that have been misconfigured to an MTU of 1500.
- Verify that valid limits are set for the virtual machine being vMotioned. For more information, see VMware vMotion fails if target host does not meet reservation requirements (1003791).
- Verify the virtual hardware is not out of date. For more information, see vMotion fails at 82% with the hostd log error: Source detected that destination failed to resume (1006052).
- This issue may be caused by SAN configuration. Specifically, this issue may occur if zoning is set up differently on different servers in the same cluster.
- Verify and ensure that the log.rotateSize parameter in the virtual machine's configuration file is not set to a very low value. For more information, see vMotion fails at 10% with the error: Operation timed out (2007343).
- If you are migrating a 64-bit virtual machine, verify that the VT option is enabled on both the source and destination host. For more information, see vMotion fails at 90% with the error: A general system error occurred: failed to resume on destination message (1008735).
- Verify that there are no issues with the shared storage or networking. For more information, see Identifying Fibre Channel, iSCSI, and NFS storage issues on ESX/ESXi hosts (1003659) and vMotion fails with the error: Migration to host <> failed with error (1030267).
- If you are using NFS storage, verify if the VMFS volume containing the VMDK file of a virtual machine being migrated is on an NFS datastore and the datastore is not mounted differently on both the source and destination. For more information, see vMotion fails on a NFS datastore (1023230).
- If you are using VMware vShield Endpoint, verify the vShield Endpoint LKM is installed on the ESX/ESXi hosts to which you are trying to vMotion the virtual machine. For more information, see Trying to vMotion or power on a virtual machine being protected by vShield Endpoint fails (1030463).
vCenter Server switches the virtual machine's ESXi host from the source to destination
During this stage, the virtual machine is running on the destination ESXi host. vCenter Server will update to reflect this change by changing the virtual machines host and pool to the destination host. The source virtual machine is powered down and unregistered from the source host. Any resources that were in use are released back to the host.
vCenter Server completes the vMotion task
During this stage, the vMotion task is marked as complete.
Common issues during this stage are: