Cluster Commissioning Fails at "Migrate ESXi Host Management vmknic(s)" Resulting in a Stuck Cluster
search cancel

Cluster Commissioning Fails at "Migrate ESXi Host Management vmknic(s)" Resulting in a Stuck Cluster

book

Article ID: 439232

calendar_today

Updated On:

Products

VMware Cloud Foundation VMware SDDC Manager / VCF Installer

Issue/Introduction

  • When attempting to commission hosts or create a new cluster in VMware Cloud Foundation (VCF) SDDC Manager, the workflow fails specifically at the following task:

"Migrate ESXi Host Management vmknic(s) to vSphere Distributed Switch"

  • Following this failure, the cluster enters a stuck, failed, or inconsistent state. Standard management operations (such as expansion or workload deployment) are blocked.

  • The "Configuration Status" shows "ERROR" in the SDDC UI 



  • From /var/log/vmware/vcf/domainmanager/domainmanager.log, we see below error

    YYYY-MM-DDTHH:MM:SS.123Z INFO  [domainmanager, 61a2b3c4d5e6f7g8, 1234] [c.v.v.c.f.p.n.a.MigrateHostManagementVmkAction] Initiating migration of vmk0 for host <ESXi_FQDN> to vDS <vDS_Name>
    YYYY-MM-DDTHH:MM:SS.456Z INFO  [domainmanager, 61a2b3c4d5e6f7g8, 1234] [c.v.v.c.f.p.n.a.MigrateHostManagementVmkAction] Waiting for host <ESXi_FQDN> to reconnect to vCenter Server...
    YYYY-MM-DDTHH:MM:SS.789Z ERROR [domainmanager, 61a2b3c4d5e6f7g8, 1234] [c.v.v.c.f.p.n.a.MigrateHostManagementVmkAction] Timeout waiting for host connectivity. Expected state: CONNECTED, Current state: NOT_RESPONDING.
    YYYY-MM-DDTHH:MM:SS.890Z ERROR [domainmanager, 61a2b3c4d5e6f7g8, 1234] [c.v.v.c.f.p.n.a.MigrateHostManagementVmkAction] Host <ESXi_FQDN> failed to reconnect after migrating vmknic to vDS. Network connectivity was lost.
    YYYY-MM-DDTHH:MM:SS.901Z ERROR [domainmanager, 61a2b3c4d5e6f7g8, 1234] [c.v.e.s.o.model.error.ErrorFactory] [XXXXXX] MIGRATE_VMKNIC_FAILED Failed to migrate ESXi Host Management vmknic(s) to vSphere Distributed Switch.
    com.vmware.evo.sddc.orchestrator.exceptions.OrchTaskException: Failed to migrate ESXi Host Management vmknic(s) to vSphere Distributed Switch.

Environment

VMware Cloud Foundation 9.x

Cause

This issue is a two-part failure:

  1. Initial Workflow Failure: The failure is caused by incorrect VLAN assignments, MTU mismatches, or missing network configurations on the destination physical uplink. When the host attempts to migrate its management interface to the vSphere Distributed Switch (vDS), it loses connectivity to the vCenter Server, causing the task to time out and fail.

  2. Resulting Stuck State: SDDC Manager relies on strict desired-state configurations. When the workflow is interrupted mid-execution, the backend database retains partial metadata, and the associated ESXi hosts are left with incomplete network configurations. The system locks the "half-configured" cluster to prevent database corruption

Resolution

To recover the hosts and rebuild the cluster, utilize the automated teardown workflow to systematically wipe the partial configurations.

  1.  Log in to the management vCenter Server and take an offline snapshot of the SDDC Manager VM. Ensure Snapshot the virtual machine's memory is unchecked.

  2.  Log in to the SDDC Manager UI. Navigate to Inventory > Workload Domains > [Target Domain] > Clusters. Select the stuck cluster, click Actions, and choose Delete Cluster.

  3.  Monitor the task pane while backend scripts scrub the database and clean up the ESXi hosts. Once the hosts return to an Unassigned state, resolve the underlying physical network issue (e.g., correct the VLAN pruning on the Top of Rack switches), and then recreate the cluster.

Additional Information

If the UI deletion workflow fails entirely, or if the cluster was previously removed but stale database entries remain, manual cleanup via SSH and database edits will be required.

For instructions on manual removal, refer to the following Broadcom Knowledge Base article:

Steps to manually remove a cluster from SDDC Manager after a failed Cluster Deletion attempt.