Tanzu Kubernetes Cluster (TKC) creation is stuck and no control plane VMs are deployed when an old cluster name is re-used.
This usually occurs if the old cluster was manually removed/cleaned up.
The following symptoms are observed:
kubeadmcontrolplane(kcp)
object gets created but it never completes the creation of virtualmachine
object.VMware vSphere with Tanzu 7.0
VMware vSphere with Tanzu < 8.0 U3
The control plane VM does not get created due to an upstream bug in kcp Versions prior to v1.5.0, v1.4.5, v1.3.10 (upstream fix).
One condition which leads to this situation is when a TKC and its associated resources are deleted manually.
Cached information on the kcp
controller does not get cleaned up automatically due to the upstream issue and this causes the new TKC creation to fail.
Fix:
This issue has been fixed with the upstream ClusterAPI and the same fix is implemented in vSphere 8.0 U3 or above.
Workaround:
From the Supervisor Cluster, restart the capi-kubeadm-control-plane-manager
deployment using below command. This will clean up the stale cache and unblock the reconciliation of kcp
object.
# kubectl rollout restart deploy capi-kubeadm-control-plane-controller-manager -n vmware-system-capw
# for resource in `k api-resources -n <namespace> | grep -v NAME | awk '{print $1}'`; do echo ; echo ; echo $resource ; k get $resource -n <namespace> | grep <cluster-name> ; done 2>/dev/null > resources.lst