Cluster Expansion Fails at "Migrate ESXi Host Management vmknic(s) to vSphere Distributed Switch"
search cancel

Cluster Expansion Fails at "Migrate ESXi Host Management vmknic(s) to vSphere Distributed Switch"

book

Article ID: 431652

calendar_today

Updated On:

Products

VMware Cloud Foundation

Issue/Introduction

When expanding a cluster or adding a host to an existing Workload Domain within VMware Cloud Foundation (VCF), the SDDC Manager workflow fails during the network migration phase.

Specifically, the task Migrate ESXi Host Management vmknic(s) to vSphere Distributed Switch stops with a communication error. This occurs because the host loses connectivity to vCenter during the migration of the management interface (vmk0) from a Virtual Standard Switch (VSS) to a vSphere Distributed Switch (VDS), triggering an automatic network rollback.

  • Log Entries (/var/log/vmware/vcf/domainmanager/domainmanager.log):

    ERROR [vcf_dm] Unable to migrate vmknic vmk0 -- retrying 1
    com.vmware.vim.binding.vmodl.fault.HostCommunication: An error occurred while communicating with the remote host.
    
  • Log Entries (/var/log/hostd.log on the ESXi host):

    Event 1323 : Lost uplink redundancy on DVPorts... Physical NIC vmnic1 is down.
    Event 1894 : Network configuration on the host <hostname> is rolled back...

Environment

VMware Cloud Foundation

Cause

This issue is typically caused by a mismatch or misconfiguration in the Cisco UCS Service Profile.

When the migration starts, the ESXi host attempts to use the assigned physical uplinks (vmnics) for the VDS. If the UCS Network Control Policy, vNIC Templates, or VLAN permissions are incorrectly configured (e.g., the Management VLAN is not allowed on the trunk or the MTU is mismatched), the host loses its heartbeat to vCenter. vSphere then initiates a Network Rollback to the original VSS configuration to prevent the host from becoming permanently isolated, which causes the VCF workflow to fail.

Resolution

To resolve this issue, ensure the physical network profile on the Cisco UCS side aligns with the VCF requirements:  vSphere Lifecycle Manager Recommended Images

  1. Verify UCS Service Profile: Log in to Cisco UCS Manager (UCSM) and locate the Service Profile associated with the failing ESXi host.

  2. Audit vNIC Templates: Confirm that the VLANs required for Management, vMotion, and Geneve/NSX-T are explicitly allowed on the vNICs.

    • Ensure the Native VLAN setting matches the management network of the host.

  3. Check Network Control Policy: Ensure the policy allows for LLDP/CDP and that the link state is "Up."

  4. Confirm Uplink Connectivity: On the ESXi host, run the following command to ensure the physical links are active: esxcli network nic list

    • Verify that vmnic0 and vmnic1 (or whichever NICs are used for the VDS) show "Up" at the expected speed/duplex.

  5. Restart the Workflow: Once the UCS network configuration is corrected and link stability is confirmed, return to the SDDC Manager UI and click Retry on the failed "Expand Cluster" task.

Additional Information

If using Cisco UCS VIC (Virtual Interface Card), ensure that the number of virtual interfaces defined in the profile does not exceed the hardware limit and that the "Placement" of vNICs matches the expected VCF uplink order (typically vmnic0 and vmnic1).