Scale-up new worker Node VM fails to initialize network interface despite valid IPAM allocation in TCA
search cancel

Scale-up new worker Node VM fails to initialize network interface despite valid IPAM allocation in TCA

book

Article ID: 426712

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

  • When scaling out a Node Pool (e.g., increasing from 15 to 16 nodes) in VMware Telco Cloud Automation (TCA), the task stays in processing state.
  • The new Worker Node is stuck in a "Provisioning" state and appears not to have an IP address assigned, even though free IPs exist in the TCA IPAM pool.
  • Within the Management Cluster, the kubectl describe command on corresponding Machine and VSphereVM resources shows machine in a Waiting for IP allocation state or similar IP pending conditions.
  • Logging into the stalled VM via Web Console shows that the network configuration files are correct, but the interface is physically down:
    ip addr shows eth0 state DOWN.
    dmesg logs indicate failure to bring the link up.
  • Manual attempts to force the link up (ip link set dev eth0 up) fail or have no effect.

Environment

TCA 3.3

TKG 2.5.2

Cause

The root cause is localized to the specific ESXi host where the Virtual Machine was placed. The host is in a state where it fails to properly initialize the virtual network adapter (vNIC) connectivity for new VMs, causing the Guest OS interface to remain permanently DOWN. This prevents the Guest OS from applying the IP address provided by IPAM, which in turn prevents the node from reporting "Ready" back to the Management Cluster.

Resolution

Since the issue lies with the underlying ESXi host's ability to map virtual hardware, the resolution involves moving the workload or remediating the host.

Workaround:

  1. Identify the VM: Locate the stalled Worker Node VM in the vCenter UI.
  2. Migrate (vMotion): Right-click the VM and select Migrate.
  3. Change Compute Resource: Select Change compute resource only and move the VM to a different ESXi host within the same cluster.
  4. Verify Connectivity: Once the migration completes:
    1. The eth0 interface inside the Guest OS should automatically transition to UP.
    2. VMware Tools will report the IP address to vCenter.
  5. The TCA/TKG reconciliation loop will detect the IP and mark the node as Provisioned.

Permanent Fix:

  1. Investigate Host: Place the problematic ESXi host (the source of the issue) into Maintenance Mode.
  2. Remediate: Review vmkernel.logs for vSwitch or driver errors.
  3. Rebooting the ESXi host usually clears the stale state affecting vNIC initialization.