HCX - Migration switchover failed with the error "Relocate task failed"
search cancel

HCX - Migration switchover failed with the error "Relocate task failed"

book

Article ID: 369919

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

  • HCX vMotion, Replication Assisted vMotion (RAV), or Cold migrations fail during the switchover.
  • In the HCX Manager UI -> Migration -> Switchover events for the failed migration, the following errors are observed: "Relocate task failed on source side" and "Relocate task failed on target side"
  • The following error is observed in /common/logs/admin/app.log:
ERROR c.v.h.s.v.j.MonitorSourceSideProgressWorkflow- [migId=#######] Source side relocate 'task-###' failed for the virtual machine. Error is A fatal internal error occurred. See the virtual machine's log for more details. msg.svmotion.fail.internal:A fatal internal error occurred. See the virtual machine's log for more details. msg.svmotion.disk.copyphase.failed:Failed to copy one or more disks. msg.mirror.disk.copyfailed:Failed to copy source (/vmfs/volumes/<datastore-uuid/<vm-name-folder_###>/<VM-name>.vmdk) to destination (/vmfs/volumes/<datastore-uuid>/<VM-name-folder>/<vm-name>.vmdk): Timeout.  vob.vmotion.stream.check.block.mem.timed.out:VMotionStream [167772258:3406475827015006463] timed out while waiting for disk 0's queue count to drop below the maximum limit of 32768 blocks. This could indicate either network or storage problems preventing proper block transfer. faultTime:2024-05-20T05:14:36.332619Z. Total progress % is 'null'.com.vmware.vim.binding.vim.fault.GenericVmConfigFault: A fatal internal error occurred. See the virtual machine's log for more details.

 

ERROR c.v.h.s.v.j.MonitorTargetSideProgressWorkflow- [migId= #######] Target side relocate 'task-###' failed for the virtual machine. Error is A general system error occurred: vMotion failed: unknown error msg.migrate.waitdata.platform:Failed waiting for data. Error 195887167. Connection closed by remote host, possibly due to timeout.  vob.vmotion.recv.connection.closed:vMotion migration [-1062666551:2166393136529672478] the remote host closed the connection unexpectedly and migration has stopped. The closed connection probably results from a migration failure detected on the remote host. faultTime:2024-05-20T05:14:18.925234Z. Total progress % is null.com.vmware.vim.binding.vmodl.fault.SystemError: A general system error occurred: vMotion failed: unknown error.

 

vMotion failed. System Error. Source side error is: Source side relocate failed for the virtual machine. Migration to host failed with error Timeout (195887137). msg.checkpoint.precopyfailure:Migration to host failed with error Timeout (195887137). vob.vmotion.net.send.start.failed:vMotion migration [180539780:3187926962485084139] failed to send init message to the remote host vob.vmotion.net.sbwait.timeout:vMotion migration timed out waiting 20001 ms to transmit data. Target side error is: A general system error occurred: vMotion failed: unknown error msg.checkpoint.migration.nodata: The vMotion failed because the destination host did not receive data from the source host on the vMotion network. Please check your vMotion network settings and physical network configuration and ensure they are correct.

Environment

HCX

Cause

This issue is typically caused by a network configuration or network issue during the HCX migration. During the migration process, the source ESXi host transmits data to the source IX/MA (Mobility Agent) host on the same site. The IX-I source appliance then sends the data to IX-R (remote site) using the HCX tunnel. Finally, the IX/Mobility Agent host on the target site creates another connection to the target ESXi host. This process involves multiple connections, and any network configuration or issue, such as port blocking, MTU settings, network retransmission/packet drops, or low network bandwidth, can disrupt this communication and cause the task to fail. The underlay network should be investigated.

Resolution

  • Navigate to HCX Manager UI -> Interconnect > Service Mesh > Run Diagnostics and review the results for any errors. The diagnostics will test connectivity from the IX appliance to the required components (e.g., vCenter, ESXi hosts, etc.) and identify any issues related to network communication. If there are any errors related to closed ports, review the network and firewall configuration. For more information on the required ports, refer to the VMware Ports and Protocols and Network Diagrams for VMware HCX . This issue can be caused if the port 8000 is blocked, which is required during the vMotion.

  • In the HCX Manager UI, navigate to Transport Analytics to verify underlay network performance for Service Mesh uplinks and ensure you meet the minimum network underlay requirements for HCX Migrations. For more details, visit Network Underlay Minimum Requirements.
    For detailed performance diagnostics, you can run 'perftest all' from the HCX Central CLI. For details on interpreting the results and running the perftest, refer to the Network Underlay Characterization and HCX Performance Outcomes , specifically pages 11 to 13.

  • To review historical data (e.g: Throughput, Latency, Loss), use the Transport Monitor. From the Transport Analytics page in the HCX Manager UI, select Transport Monitor.

  • MTU issues are a common cause of performance or migration failures.

    • To test the MTU, perform the following steps:
      • Log in to the HCX manager as the admin user.
      • Run the command ccli, type 'list' and make a note of the IX appliance node ID.
      • Navigate to the IX appliance node using the command go <IX appliance node ID from the above step>.
      • Run the command 'pmtu'.

        Alternatively, use the following command to test the MTU:

      • ping -M do -s <MTU-28 bytes> <Target-IX-uplink-ip>
      • For example, if the MTU is 1500, use this syntax:
      • ping -M do -s 1472 <Target-IX-uplink-ip>

    • From the results, ensure there is no mismatch between the sites. Manually review the network profile settings for the uplink; both MTU values should match.
    • Please note: If there is an MTU mismatch between the uplinks, update the network profile and resync the service mesh. If WAN Optimization is deployed, ensure to redeploy the IX/WO together.

For more information related to HCX Best Practices and Health Check, please refer - HCX - Health Check and Best Practices