When enabling Workload Management on a vSphere Cluster, the Supervisor Cluster deployment fails to reach a "Healthy" status. The progress remains indefinitely stuck in the "Provisioning" or "Configuring" state within the vCenter Server UI.
In this condition:
Diagnostic Verification:
crictl ps -a --name kube-apiservercrictl logs <container-id> cat /var/log/cloud-init-output.logTypical log indicators include:
1. Cloud-init Network Device Info
This snippet from the cloud-init output confirms that the node was provisioned with a missing network interface (eth1) and the existing interface (eth0) failed to initialize properly.
Cloud-init v. 25.1 running 'init'
ci-info: ++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++
ci-info: +--------+-------+-----------+-----------+-------+-------------------+
ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
ci-info: +--------+-------+-----------+-----------+-------+-------------------+
ci-info: | eth0 | False | . | . | . | 00:50:56:xx:xx:xx |
ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . |
ci-info: +--------+-------+-----------+-----------+-------+-------------------+
2. API Server Container Logs
This snippet from the kube-apiserver logs identifies the specific failure to locate the required eth1 interface and the resulting connection refusal.
stderr F I0225 10:07:07.698075 1 options.go:221] external host was not specified, using 10.10.xx.xxx
stderr F W0225 10:07:07.701754 1 filtered_certs.go:38] Unable to locate interface specified in filter: eth1
stderr F W0225 10:07:07.831592 1 logging.go:59] [core] [Channel #2 SubChannel #3] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
Additionally, verifying the VM hardware settings in vCenter will show that only one vNIC is attached instead of two.
VMware vSphere 7.x / 8.x
vSphere with Tanzu (Supervisor Cluster) and VMware vSphere Kubernetes Service (VKS)
VCF 4.x / 5.x
The failure is caused by an automated provisioning anomaly where the Control Plane VM is deployed with a single vNIC. A standard, successful Supervisor deployment requires two (2) vNICs:
1. eth0: Management Network (Communication with vCenter/ESXi).
2. eth1: Primary Workload Network (Communication for K8s API and Services).
Because the automation engine (ESX Agent Manager) fails to attach the second interface, the kube-apiserver service cannot bind to its designated Workload Network IP address. This results in the API server container crashing or entering an "Exited" state during the bootstrap process.
Manual hardware modification of Supervisor Control Plane VMs is not supported and will not resolve the internal software configuration mismatch. A clean redeployment is required.
Step 1: Decommission the Failed Cluster
Remove the partially deployed cluster to ensure all misconfigured components are cleanly removed.
Step 2: Verify Network Prerequisites
Before re-attempting deployment, ensure the underlying infrastructure can support the dual-NIC requirement:
IP Availability: Confirm both the Management and Workload IP pools have at least 5 available IP addresses per cluster (3 for Control Plane, 1 for VIP, 1 for upgrade headroom).
Port Group Configuration: Ensure the Distributed Port Groups used for the Workload Network have correct VLAN tagging and are accessible by all hosts in the cluster.
Step 3: Recreate the Supervisor Cluster
1. Initiate a fresh deployment from vCenter.
2. The automation engine will provision all three Control Plane VMs with the required dual-vNIC configuration