HCX cold migration failing with error "Error is A general system error occurred: Failed to send VCC_COMPLETE to destination. Total progress % is 'null'."
search cancel

HCX cold migration failing with error "Error is A general system error occurred: Failed to send VCC_COMPLETE to destination. Total progress % is 'null'."

book

Article ID: 419883

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

  • HCX cold migrations starts but eventually fails with the following error:



  • The following ERROR message observed in HCX Manager /common/logs/admin/app.log:

    <Timestamp> [VmotionService_SvcThread-11828, Ent: HybridityAdmin, , TxId:########-####-####-####-############] ERROR c.v.h.s.v.j.MonitorSourceSideProgressWorkflow- [migId=########-####-####-####-############]] Source side relocate 'task-24291174' failed for the virtual machine. Error is A general system error occurred: Failed to send VCC_COMPLETE to destination. Total progress % is 'null'.


  • The following message observed in vCenter /var/log/vmware/vpxd/vpxd.log:

    <Timestamp> info vpxd[20897] [Originator@6876 sub=vmomi.soapStub[5453] opID=TaskLoop-host-50] SOAP request returned HTTP failure; <<cs p:00007fe1302199c0, PIPE:/var/run/envoy-hgw/hgw-pipe>, /hgw/host-50/vpxa>, method: waitForUpdatesEx; code: 504(Gateway Timeout); fault: (null)
    <Timestamp> error vpxd[20897] [Originator@6876 sub=Vmomi opID=TaskLoop-host-50] Got vmacore exception when invoking VMOMI method; <</hgw/host-50>, /vpxa>, vmodl.query.PropertyCollector.waitForUpdatesEx, N7Vmacore4Http13HttpExceptionE(HTTP error response: Gateway Timeout)
    --> [context]###################
    ########
    ########
    [/context]
    <Timestamp> info vpxd[20897] [Originator@6876 sub=TaskInfo opID=TaskLoop-host-50] WaitForUpdates failed; e: N5Vmomi5Fault17HostCommunication9ExceptionE(Fault cause: vmodl.fault.HostCommunication
    --> )
    --> [context]################=[/context]
    <Timestamp> warning vpxd[20897] [Originator@6876 sub=VpxProfiler opID=TaskLoop-host-50] InvokeWithOpId [TotalTime] took 316526 ms
    <Timestamp> info vpxd[20910] [Originator@6876 sub=vpxTaskInfo opID=54737-TxId:########-####-####-####-############-d7-01-TaskLoop-host-50] Task vim.Task:task-8655 disconnect with fault [N5Vmomi5Fault17HostCommunicationE]
    <Timestamp> error vpxd[20905] [Originator@6876 sub=VmProv opID=54737-TxId:########-####-####-####-############-d7-01] Failed to track task vim.Task:task-8655 on host vim.HostSystem:host-50: Fault cause: vmodl.fault.HostCommunication
    -->
    --> backtrace:
    --> [backtrace begin] product: VMware VirtualCenter, version: 9.0.0, build: build-24755230, tag: vpxd, cpu: x86_64, os: linux, buildType: release
    --> backtrace[00] libvmacore.so[0x004837CB]
    --> backtrace[01] libvmacore.so[0x003730AC]: Vmacore::System::Stacktrace::CaptureWork(unsigned int)
    --> backtrace[02] libvmacore.so[0x0038550F]: Vmacore::System::SystemFactory::CreateQuickBacktrace(Vmacore::Ref<Vmacore::System::Backtrace>&)
    --> backtrace[03] libvmomi.so[0x00179985]
    --> backtrace[04] libvmomi.so[0x00270E30]: Vmomi::Fault::HostCommunication::ThrowInternal()


  • The following message observed in vCenter /var/log/vmware/vpxd/vpxd.log:

    <Timestamp> warning vpxd[4049275] [Originator@6876 sub=VpxProfiler opID=TaskLoop-host-2752627] InvokeWithOpId [TotalTime] took 31417 ms
    <Timestamp> error vpxd[2983820] [Originator@6876 sub=VmProv opID=########-####-####-####-############-f4-01] Get exception while executing action vpx.vmprov.CreateDestinationVm:
    --> (vmodl.fault.SystemError) {
    --> reason = "Failed to send VCC_COMPLETE to destination",
    --> msg = "fault.SystemError.summary",
    --> }

Cause


Due to a known issue, vpxd (on VC) to vpxa on MA(Mobility Agent) property collector connection sometimes fails with a time out. NFC copy does complete successfully but VC fails to get the task update while fetching the update from the host.

Resolution

This is a known issue impacting VMware HCX.
 
Workaround:
 
Disable timeouts for /vpxa API in the envoy reverse proxy service in both source and target IX appliances:
  • Login to the HCX Manager appliance with SSH and open SSH session to IX appliance following the below steps:
    • # ccli
      # list
      # go 0 (assuming that 0 is the IX appliance)
      # ssh 

  • Take a backup of the file with the following command: 
    • # cp /etc/vmware/envoy.yaml /etc/vmware/envoy.yaml.bak

  • Open vi editor to edit envoy.yaml file:  
    • # vi /etc/vmware/envoy.yaml

  • Add "idle_timeout: 0s" under /vpxa section as per below example:
    • Before:
                        routes:
                          - match:
                              path_separated_prefix: "/vpxa"
                            route:
                              cluster: "vpxa-cluster"
                              timeout: 0s
                              
    • After:
                        routes:
                          - match:
                              path_separated_prefix: "/vpxa"
                            route:
                              cluster: "vpxa-cluster"
                              timeout: 0s
                              idle_timeout: 0s 

    • NOTE: Make sure that the indentation is not a TAB on keyboard, use SPACE bar. Bad Indentation causes envoy service to fail during restart.

  • Restart envoy reverse proxy service for changes to take effect:
    • # systemctl restart envoy

If you have issues with the workaround, please open a Broadcom GS support case to get assistance.