You’ll notice that once you initiate the cluster creation, it gets stuck on the first control plane node. When you SSH into that node (ssh capv@<node-ip>
), you’ll see that the following components are in a running state: kube-proxy
, kube-vip
, kube-controller-manager
, etcd
, kube-apiserver
, and kube-scheduler:
CONTAINER IMAGE CREATED STATE NAME ATTEMPT
c92f7b6a3bd01 2f7e1c45a1b8f 8 minutes ago Running kube-proxy 0
e581c43a7fa9d a6b4c83219ee7 9 minutes ago Running kube-vip 0
9ae04d63c2d67 fdc31eab2481c 9 minutes ago Running kube-controller-manager 0
1347c8e83d1f2 b9fe2019d61ab 9 minutes ago Running etcd 0
f20ad46eb8c78 67cd2198b44d3 9 minutes ago Running kube-apiserver 0
db519adfa2b7e 4f2a97ed6cc90 9 minutes ago Running kube-scheduler 0
When examining the Kubelet on the control plane node, you’ll see that the service is running (systemctl status kubelet.service
), but the logs continuously show that the CNI is not initialised:
journalctl -xeu kubelet
Jan 01 12:00:00 workload-cluster kubelet[1677]: E0101 12:00:00.000000 1677 kubelet.go:2855] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Jan 01 12:00:00 workload-cluster kubelet[1677]:E0101 12:00:00.000000 1677 kubelet.go:2855] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
TKGm 2.4.0+
A common cause of this issue is that the Cluster API components (CAPI and CAPV) are unable to communicate with the new workload cluster. This can be confirmed by checking the CAPI logs:
E0101 12:00:00.000000 1 controller.go:329] "Reconciler error" err="failed to create cluster accessor: error creating http client and mapper for remote cluster \"default/workload-cluster\": error creating client for remote cluster \"default/workload-cluster\": error getting rest mapping: failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"https://198.51.100.1:6443/api/v1?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/workload-cluster-controlplane-v78ki" namespace="default" name="workload-cluster-controlplane-v78ki" reconcileID=""
CAPI is responsible for provisioning and managing the lifecycle of Kubernetes clusters. If the API server of the new workload cluster cannot communicate with CAPI on the management cluster, it won't be able to complete its setup tasks - including initialising the CNI. In the above case, a firewall rule is blocking communication on port 6443 to the workload cluster.
In the above case, a firewall rule blocked traffic on port 6443, which prevented the workload cluster from registering with the management cluster. After the port was allowed in firewall configuration, the cluster deployed without issue. If you have a similar problem, review the below documents to ensure proper networking configuration:
If applicable, carefully review the proxy settings and any other networking configurations specified in the cluster configuration file during creation:
Verify the overall network setup in the environment to ensure proper connectivity between the management cluster and the workload cluster: