vMotion tasks fail at 20% after a reboot or ESXi upgrade
search cancel

vMotion tasks fail at 20% after a reboot or ESXi upgrade

book

Article ID: 410758

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

vMotion tasks fail at 20% with the below error:

The vMotion failed because the destination host did not receive data from the source host on the vMotion network. Please check your vMotion network settings and physical network configuration and ensure they are correct. YYYY-MM-DDTHH:MM:SS.ssssssZ Migration [#########:###################] failed to connect to remote host <###.###.###.###> from host <###.###.###.###>: Host is down. vMotion migration [#########:###################] failed to create a connection with remote host <###.###.###.###>: The ESX hosts failed to connect over the VMotion network The vMotion migrations failed because the ESX hosts were not able to connect over the vMotion network. Check the vMotion network settings and physical network configuration.

vMotion operations worked prior to a host reboot or upgrade (consisting of a reboot).

The vMotion adapter (e.g. vmk1) is on a virtual portgroup using a load balancing policy other than "Route based on IP hash".

Pinging between the source and destination hosts' vmkernel adapters with the correct MTU using the vmkping command fails with 100% packet loss. See Testing VMkernel network connectivity with the vmkping command

 

Cause

Due to the load balancing policy, the NIC used by the vMotion adapter may change following a reboot, and the in-use NIC is unable to communicate on the vMotion network.

 

Resolution

Change the NIC that the vMotion adapter is using to restore connection between the hosts over the vMotion network.

The in-use NIC can be verified by opening an SSH session to the host and typing esxtop followed by the "n" key to view the networking page (note: press "q" to exit esxtop view).

The in-use NIC can then be replaced by doing one of the following:

  • Change the Teaming and Failover policy for the vMotion portgroup so the current NIC is not used by the vMotion portgroup:
  • Down the in-use NIC and force a failover to any other Active or Standby uplink using the following command, changing vmnic# to the NIC in question, e.g. vmnic2:
    • esxcli network nic down -n vmnic#
    • NOTE: This will down the NIC on the ESXi host, so anything else using that NIC will also either failover if another Active/Standby NIC is available, or be disconnected if nothing else is available, as a result of this command.
    • NOTE: If failback is enabled on the vMotion portgroup, be aware that bringing the NIC back up will lead to vMotion resuming the connection on the NIC that does not work. Therefore the NIC will need to remain in a downed state until it can be addressed, or failback will need to be disabled on the vMotion portgroup before bringing the NIC back into an up state.

Alternatively, the in-use physical NIC/datapath can be investigated and corrected by the physical network team or vendor to allow communication on the vMotion network with that NIC.