Troubleshooting vMotion fails with network errors

Troubleshooting vMotion fails with network errors

book

Article ID: 318636

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • You see errors similar to:

    Example 1:

    The VMotion failed because the ESX hosts were not able to connect over the VMotion network. Please check your VMotion network settings and physical network configuration.
    VMotion [184329483:1276605211167987] failed to create connection with remote host <xxx.xxx.xxx.xxx>: The ESX hosts failed to connect over the VMotion network
    Migration [184329483:1276605211167987] failed to connect to remote host <xxx.xxx.xxx.xxx>: Timeout


    Example 2:

    The VMotion failed because the ESX hosts were not able to connect over the VMotion network. Please check your VMotion network settings and physical network configuration.
    VMotion [-1408237366:1279683851917265] failed to create connection with remote host <xxx.xxx.xxx.xxx>: The ESX hosts failed to connect over the VMotion network
    Migration [-1408237366:1279683851917265] failed to connect to remote host <xxx.xxx.xxx.xxx>: Connection refused
Example 3:
 

 The vMotion failed because the destination host did not receive data from the source host on the vMotion network. Please check your vMotion network settings and physical network configuration   and ensure they are correct

 

  • In the /var/log/hostd.log file, you see entries similar to:

    [27703B90 verbose 'Vmsvc.vm:/vmfs/volumes/5538de90-249141cc-a57c-40f2e9638530/TEMP-TEMP/TEMP-TEMP.vmx'] Handling message _vmx1: Migration [a991333:1451374787315531] failed to connect to remote host from host : Timeout

Please note that the "xxx.xxx.xxx.xxx" text represents IP addresses, which have been anonymized here for security reasons.

Environment

VMware vSphere ESXi 8.0.x
VMware vSphere ESXi 7.0.x
VMware vSphere ESXi 6.7
VMware vSphere ESXi 6.5
VMware vSphere ESXi 6.0
VMware vSphere ESXi 5.5
VMware vSphere ESXi 5.1

Cause

The network configuration (either virtual or physical) present is invalid.

Resolution

To resolve this issue:

  1. Check for IP address conflicts on the vMotion network.

    Note: Each host in the cluster should have a vMotion vmkernel, assigned a unique IP address.
     
  2. Check for packet loss over the vMotion network. Try having the source host ping (vmkping) the destination host's vMotion vmknic IP address for the duration of the vMotion.

    For example:

    vmkping -I vmkX <IP of destination host vMotion vmk>

    Where vmkX is the vmkernel number of the vMotion vmkernel on the source host.

    For more information, see Testing VMkernel network connectivity with the vmkping command.
     
  3. Check the driver/firmware of the physical adapter used for vMotion.

    Note: If the driver or firmware is not up to date, the vMotions can fail with a "timeout" or "waiting for data" error. For more information, see Determining Network/Storage firmware and driver version in ESXi.
     
  4. Ensure the ESXi firewall ruleset for vMotion is enabled and that there are no other conflicting rules blocking the traffic. 
  1. Check for potential interaction with firewall hardware or software that prevents connectivity between the source and the destination TCP port 8000.




Additional Information

For more information on vMotions and troubleshooting vMotion issues, see Understanding and troubleshooting vMotion.

Other examples:

  • Migration [184329483:1276605211167987] failed to connect to remote host <xxx.xxx.xxx.xxx>: Timeout.

    This error indicates that remote host did not accept the connection within the allowed time limit.
     
  • Migration [-1408237366:1279683851917265] failed to connect to remote host <xxx.xxx.xxx.xxx>: Connection refused.

    This error indicates that the remote host is explicitly not listening on the vMotion port.  The vMotion port is 8000, see also Port requirements for VMware vSphere ESXi

Additional Articles



Impact/Risks:
Network misconfiguration can cause random vMotion failures. Retrying the vMotion operation may be successful, but VMware recommends that you follow this article to isolate and correct the problem.