IX appliance re-deployment fails with the message "Adding host to DVS <DVS_name> failed"
search cancel

IX appliance re-deployment fails with the message "Adding host to DVS <DVS_name> failed"

book

Article ID: 439799

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

  • It is observed from the HCX Manager UI that the IX appliance (Inter-Connect) re-deployment task fails with the following message:
    Appliance operation failed for applianceId <UUID> with error Interconnect Service Workflow interconnectConfigureMA failed. Error: Adding Mobility Agent Host failed. Adding host to DVS <DVS_name> failed. Error : Cannot complete a vSphere Distributed Switch operation for one or more host members.

  • The following logs are recorded in /common/logs/admin/app.log within the HCX Manager log bundle:
  • The error message states that Adding host to DVS failed
    <Timestamp> UTC [InterconnectService_SvcThread-59, J:<JID>, , TxId: <TxId>] WARN  c.v.v.h.s.i.InterconnectRedeploy- Error of unknown type : java.lang.RuntimeException
    <Timestamp> UTC [InterconnectService_SvcThread-54, J:<JID>, , TxId: <TxId>] INFO  c.v.v.h.s.i.InitiateApplianceOperation- initiateApplianceOperation Running VERIFY_APPLIANCE_OPERATION in state: <UUID> for applianceId <NAME> applianceName HCX-WAN-IX applianceType {}
    <Timestamp> UTC [InterconnectService_SvcThread-54, J:<JID>, , TxId: <TxId>] ERROR c.v.v.h.s.i.InitiateApplianceOperation- InterconnectRedeploy failed, errorCode:null. stacktrace:null, errorMessage:Interconnect Service Workflow interconnectConfigureMA failed. Error: Adding Mobility Agent Host failed. Adding host to DVS <DVS_name> failed. Error : Cannot complete a vSphere Distributed Switch operation for one or more host members.
    <Timestamp> UTC [InterconnectService_SvcThread-54, J:<JID>, , TxId: <TxId>] ERROR c.v.v.h.s.i.InitiateApplianceOperation- Failure detected while verifying completion of InterconnectServiceJobs::InterconnectRedeploy. Reason: Interconnect Service Workflow InterconnectRedeploy failed. Error: Interconnect Service Workflow interconnectConfigureMA failed. Error: Adding Mobility Agent Host failed. Adding host to DVS <DVS_name> failed. Error : Cannot complete a vSphere Distributed Switch operation for one or more host members.
    java.lang.RuntimeException: Interconnect Service Workflow InterconnectRedeploy failed. Error: Interconnect Service Workflow interconnectConfigureMA failed. Error: Adding Mobility Agent Host failed. Adding host to DVS <DVS_name> failed. Error : Cannot complete a vSphere Distributed Switch operation for one or more host members.
            at com.vmware.vchs.hybridity.service.interconnect.AbstractInterconnectJob.getSubflowJobDataArray(AbstractInterconnectJob.java:758)
            at com.vmware.vchs.hybridity.service.interconnect.AbstractInterconnectJob.checkComplete(AbstractInterconnectJob.java:2308)
            at com.vmware.vchs.hybridity.service.interconnect.InitiateApplianceOperation.handleState(InitiateApplianceOperation.java:143)
            at com.vmware.vchs.hybridity.service.interconnect.AbstractInterconnectJob.run(AbstractInterconnectJob.java:217)
            at com.vmware.vchs.hybridity.messaging.LoggingJobWrapper.run(LoggingJobWrapper.java:41)
            at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
            at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
            at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
            at java.base/java.lang.Thread.run(Thread.java:829)
    <Timestamp> UTC [InterconnectService_SvcThread-54, J:<JID>, , TxId: <TxId>] ERROR c.v.v.h.s.i.InitiateApplianceOperation- InterconnectRedeploy failed, errorCode:null. stacktrace:null, errorMessage:Interconnect Service Workflow interconnectConfigureMA failed. Error: Adding Mobility Agent Host failed. Adding host to DVS <DVS_name> failed. Error : Cannot complete a vSphere Distributed Switch operation for one or more host members.
    <Timestamp> UTC [InterconnectService_SvcThread-54, J:<JID>, , TxId: <TxId>] ERROR c.v.v.h.s.i.InitiateApplianceOperation- InterconnectRedeploy workflow failed with error Interconnect Service Workflow interconnectConfigureMA failed. Error: Adding Mobility Agent Host failed. Adding host to DVS <DVS_name> failed. Error : Cannot complete a vSphere Distributed Switch operation for one or more host members.
    <Timestamp> UTC [FailureDetectionService_EventListener, , , TxId: <TxId>] INFO  c.v.v.h.f.FailedJobEventsListener- Received a Failed Job, jobType: InterconnectServiceJobs workflow type: InterconnectRedeploy
    <Timestamp> UTC [FailureDetectionService_EventListener, , , TxId: <TxId>] INFO  c.v.v.h.f.FailedJobEventsListener- jobAndWorkflowTypesMap :44
    <Timestamp> UTC [InterconnectService_SvcThread-58, J:<JID>, , TxId: <TxId>] INFO  c.v.v.h.s.i.InitiateApplianceOperation- initiateApplianceOperation Running FAILED in state: <UUID> for applianceId <Name> applianceName HCX-WAN-IX applianceType {}
    

Environment

VMware HCX

Cause

There are 2 possible causes.

A Single Compute Profile has multiple cluster selected across multiple datacenters.

  • At the time of redeployment of the IX appliance, the workflow uses the network details mentioned in the Compute/Network Profile to attach the IX appliance vNICs to the corresponding DVS port groups on the specific DVS switch.
  • HCX further sends an API call to the vCenter to add the new IX appliance to the inventory as a Mobility Agent Host.
  • vCenter fails to assign a DVS switch to the IX appliance as the DVS switch selected to redeploy IX is present in Datacenter-A , and the Cluster/ESXi host selected to redeploy the IX appliance is present in Datacenter-B

DVS switch configuration were changed after deploying the IX appliance.

  • There is a possibility that the VDS switch that was present when the IX was deployment earlier, and later, the VDS switch name/port-group configurations were changed.
  • If the VDS switch name or configuration was altered, HCX cannot auto-correct itself because its underlying Network Profiles are still with the old configuration.

 

Resolution

Caution :- Editing Compute Profiles, Network Profiles, and triggering a Service Mesh Resync are high-impact operations in VMware HCX.
Because the Service Mesh handles active data replication, real-time migrations, and live network extensions, these configuration tasks must be executed with strict caution during an approved maintenance window.

  • If the vCenter contains multiple Datacenters or clusters that do not share the exact same VDS configuration, using a single compute profile with all the cluster/datacenter is not recommended. Refer to this article here Compute Profile Considerations and Concepts
  • Create a separate compute profiles for each Clusters and ensure the proper VDS switch is included into the specific Compute/Network Profile.
  • If the switch name or configurations were altered, HCX cannot auto-correct itself because its underlying Network Profiles are still running with the old configuration.

    Update the Network Profile
    • Login to the HCX UI, go to Infrastructure > Interconnect > Network Profiles.

    • Edit the Network Profile that is mapped to that DVS.

    • Use the dropdown to re-select the correct new/renamed DVS portgroup from live vCenter inventory.  Validate this by matching with the portgroup shown in the vCenter Inventory.

    • Save the changes.

Sync the Compute Profile

    • Go to Compute Profiles, select your profile, and click Edit.
    • Continue through the wizard without changing settings to re-validate the updated Network Profiles

    • Ensure that you select the right Network Profile and Correct DVS switch

    • click Finish.

Re-sync the Service Mesh

    • Go back to your Service Mesh.

    • Click on re-sync to ensure that the Service-Mesh is updated with the changes performed in the Network/Compute profiles.

  • Try re-deploying the IX appliance again.

Additional Information

Precautions to be taken before Edit Network Profile

  • While scaling up your Service Mesh (e.g., adding more Network Extension appliances to handle more VLANs), ensure the IP Pool has enough free unassigned IP addresses. If HCX runs out of IPs during a resync/redeploy, the task will fail midway, leaving your mesh in a degraded state.
  • When modifying the MTU size in a Network Profile (especially the Uplink profile), ensure the backing physical switches and WAN paths support it. A mismatch here will cause massive packet fragmentation and cause migration streams or L2 extensions to drop intermittently.
  • Altering the IP range of an active Network Profile causes the subsequent resync to force re-deployment of the IX and NE appliances to assign them new IPs.

Precautions to be taken while editing Compute Profile

  • As discovered with the DVS error, never add clusters from a different vCenter Datacenter into the same Compute Profile. Keep the scope limited to a single Datacenter object.
  • Always ensure the target ESXi cluster has enough spare compute capacity. If the host is overcommitted and cannot guarantee resources to the new IX/NE appliances, the deployment will fail.
  • Check the backing datastore selected in the deployment resources and ensure the specific datastore has enough free space capacity. A redeployment or resync often clones a new appliance before deleting the old one (to minimize downtime).

Precautions to be taken before Service Mesh Re-sync/Re-deploy

While modifying the compute/network profile, HCX will flag the Service Mesh as "Out of Sync." Clicking Resync pushes those architectural changes down to the active appliances.

  • A standard Resync tries to apply changes inline and is usually non-disruptive. However, a Redeploy or an aggressive configuration change (like changing a DVS or Management IP) will completely delete and recreate the HCX Fleet appliances
  • Always schedule a Service Mesh Resync/Redeploy during a strict maintenance window.
  • Ensure there are no active, running, or scheduled migrations before initiating a Resync.

Refer to the documentation below

Update and Synchronize the Service Mesh

Create a Network Profile

Create a Compute Profile