vMotion Failure Occurs on Switchover When Performing RAV Migration of Multiple VMs
search cancel

vMotion Failure Occurs on Switchover When Performing RAV Migration of Multiple VMs

book

Article ID: 411826

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

  • When performing RAV Migration of 20 VMs or more - vMotion failure may occur on switchover.
    vMotion failed. System Error. Source side error is : Source side relocate failed for the virtual machine. A specified parameter was not correct: Target side error is :
    Could not complete network copy for file [ma-ds-########-########-####-############] ###############/###############.nvram

  • The following error is observed when reviewing the migration in HCX Manager UI:
  • Source IX appliance </var/log/messages> show the following messages indicating extremely poor throughput via replication workstream:
    <132>1 <timestamps> +00:00 <Service-Mesh_name>-IX-I1 lwdproxy 887 - - connection 410 client errored after 3595807 bytes (~ 0.417 Mbps).  err: readfrom tcp <tunnel IP>:49866->tunnel IP>:31031: read tcp <MA_replication_IP>:31031-><Source_ESX_Replication_IP>:61052: read: connection reset by peer
    <133>1 <timestamps> +00:00 <Service-Mesh_name>-IX-I1 lwdproxy 887 - - connection 410 server finished after 15276 bytes (~ 0.002 Mbps)
    <133>1 <timestamps> +00:00 <Service-Mesh_name>-IX-I1 lwdproxy 887 - - connection 410 end

    NOTE: The "connection reset by peer" message is not indicating an issue. This is simply saying this connection is being tore down gracefully.

  • Deploying UPSA appliances on each respective HCX Uplink network and running iperf3 using the command <iperf3 -c <target_IP> -t 10 -i .5 -P 5 -V -p 4500> shows the below results:


Environment

VMware HCX utilizing Replication Assisted vMotion migration technology. 

Cause

  • This behavior suggests there is an intermittent underlay connectivity issue.
  • The iperf3 tests show 5 streams running in parallel showing varying throughput. In a stable Pathway all streams should be relatively close to one another. 
  • vMotion has a throughput requirement of 150Mbps. 

Resolution

  • The throughput connectivity issue should be investigated by the Organizations network team. 
  • The <MTR> (My traceroute) command can be run to/from the UPSA appliances to show current path being utilized. 

Additional Information