Workload cluster creation stalled on creation of first Control Plane node.
Only static pods running on the node.
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
test-cluster-control-plane-r7xd2 NotReady control-plane 106m v1.28.11+vmware.2 ##.##.##.## <none> Ubuntu 22.04.4 LTS 5.15.0-113-generic containerd://1.6.33+vmware.2
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-5cd8994b77-cqwz4 0/1 Pending 0 106m
kube-system coredns-5cd8994b77-fw9h2 0/1 Pending 0 106m
kube-system etcd-test-cluster-control-plane-r7xd2 1/1 Running 0 106m
kube-system kube-apiserver-test-cluster-control-plane-r7xd2 1/1 Running 0 106m
kube-system kube-controller-manager-test-cluster-control-plane-r7xd2 1/1 Running 0 106m
kube-system kube-proxy-6d7nd 1/1 Running 0 106m
kube-system kube-scheduler-test-cluster-control-plane-r7xd2 1/1 Running 0 106m
tkg-system tanzu-capabilities-controller-manager-57fd49cc8f-hr9rt 0/1 Pending 0 106m
No DNS, network connectivity or image pull issues observed from the CP node.
TKGm
The CoreDNS and Tanzu-capabilites pods are stuck in Pending state because the CNI is not running on the node.
And the antrea CNI is installed by kapp controller.
Kapp-controller and tanzu-addon-controller manager on Mgmt cluster are used to install packages on Workload cluster, including kapp-controller.
Kapp-controller on Mgmt cluster is logging error:
{"level":"info","ts":1738251158.4814796,"logger":"kc.controller.apiserver","msg":"waiting for API service to become ready. Check the status by running `kubectl get apiservices v1alpha1.data.packaging.carvel.dev -o yaml`"}
And apiservice is in FailedDiscoveryCheck state.
Restart of kapp-controller pod on Mgmt cluster resolves the issue.