/etc/kubernetes/manifests/kube-vip.yaml inside all Control Plane nodes.Unable to connect to the server" errors as below:$ kubectl get nodeUnable to connect to the server: dial tcp <VIP_ADDRESS>:6443: i/o timeoutor$ kubectl get nodeUnable to connect to the server: dial tcp <VIP_ADDRESS>:6443: connect: no route to hostprojects.registry.vmware.com location. Image pull failures will report the following in kubelet logs and when describing the pod:err: "failed to \"StartContainer\" for \"kube-vip\" with ImagePullBackOff: \"Back-off pulling image \\\"projects.registry.vmware.com/tkg/kube-vip:v0.5.12_vmware.1\\\"\"Tanzu Kubernetes Grid Multi-cloud (TKGM)
There can be multiple reasons why kube-vip pods are not running in the Control Plane nodes.
This condition is most commonly encountered in air-gapped environments and appears because the /etc/kubernetes/manifests/kube-vip.yaml is configured to pull images from the common projects.registry.vmware.com registry instead of the private registry configured during cluster creation.
For the most commonly encountered kube-vip pod failure condition noted above, editing the imageRegistry line in the /etc/kubernetes/manifests/kube-vip.yaml manifest to point to the correct custom image repository should bring up the kube-vip pods.
If modifying the /etc/kubernetes/manifests/kube-vip.yaml doesn't correct the image registry ImagePullBackOff errors, it's possible the image will need to be tagged with the default image repository to allow image pull. Use the following commands to tag the existing image with a custom image repo:
$ ctr -n=k8s.io image ls | grep vip<CUSTOM_REGISTRY_URL>.fqdn.com/repository/tkg/kube-vip:v0.5.12_vmware.1$ ctr -n=k8s.io image tag <CUSTOM_REGISTRY_URL>.fqdn.com/repository/tkg/kube-vip:v0.5.12_vmware.1 projects.registry.vmware.com/tkg/kube-vip:v0.5.12_vmware.1
If access to kubectl commands is required urgently, please reference the Additional Information below for steps to recover accessibility to kubectl commands while investigating kube-vip functionality.
In order to be able to run kubectl commands even if the cluster VIP is not assigned, you can update the kubeconfig file's clusters.cluster.server field with one of the existing eth0 vNIC IP addresses assigned to a Control Plane node.
For example, in the case below the Control Plane node has two IPv4 addresses assigned to eth0, <>.24 and <>.9.
root@workload-<cluster_name>-control-plane-lmn8b [ ~ ]# ip a s2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc prio state UP group default qlen 1000 inet <>.24/26 metric 1024 brd <>.63 scope global dynamic eth0 inet <>.9/32 scope global eth0
<>.24 corresponds to the vSphere VM's vNIC and <>.9 is the cluster VIP address.
If kube-vip pods go down, the <>.9 IP address disappears
root@workload-<cluster_name>-control-plane-lmn8b [ ~ ]# ip a s2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc prio state UP group default qlen 1000 inet <>.24/26 metric 1024 brd <>.63 scope global dynamic eth0
At this stage, running kubectl commands will return:
$ kubectl get nodeUnable to connect to the server: dial tcp <>.9:6443: i/o timeout
or
$ kubectl get nodeUnable to connect to the server: dial tcp <>.9:6443: connect: no route to host
First; check that the ETCD cluster status is healthy.
Inside a Control Plane node, execute:
$ sudo -i# alias etcdctl="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/*/fs/usr/local/bin/etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt"# etcdctl member list -w table# etcdctl endpoint health status --cluster=true -w table
Example of a healthy 3 Control Plane nodes cluster:
# etcdctl member list -w table+------------------+---------+--------------------------------------+------------------------------------------+------------------------------------------+------------+| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |+------------------+---------+--------------------------------------+------------------------------------------+------------------------------------------+------------+| 17f206fd866fdab2 | started | d5e989cf-2242-44b2-bca1-d922d1627543 | https://master-0.etcd.cfcr.internal:2380 | https://master-0.etcd.cfcr.internal:2379 | false || 1958063b94f7906b | started | e9753a70-ba7f-43f6-b3e1-b0030290a977 | https://master-1.etcd.cfcr.internal:2380 | https://master-1.etcd.cfcr.internal:2379 | false || 96d74f332197fd97 | started | 6e664768-fbf6-424a-b808-b4b7bb3c7a12 | https://master-2.etcd.cfcr.internal:2380 | https://master-2.etcd.cfcr.internal:2379 | false |+------------------+---------+--------------------------------------+------------------------------------------+------------------------------------------+------------+
# etcdctl endpoint health status --cluster=true -w table+----------------------------+--------+-------------+-------+| ENDPOINT | HEALTH | TOOK | ERROR |+----------------------------+--------+-------------+-------+| https://10.xxx.xx.xxx:2379 | true | 15.725849ms | || https://10.xxx.xx.xxx:2379 | true | 17.235013ms | || https://10.xxx.xx.xxx:2379 | true | 18.253567ms | |+----------------------------+--------+-------------+-------+
If using kubectl from an external client/jumpbox; edit the $HOME/.kube/config and substitute <>.9 or its corresponding FQDN with the existing vNIC IP, <>.24.
Example:
$ cp -p $HOME/.kube/config ./config_bkp$ vim $HOME/.kube/config- cluster: certificate-authority-data: <> server: https://<>.24:6443$ kubectl get nodeNAME STATUS ROLES AGE VERSIONworkload-<cluster_name>-control-plane-lmn8b NotReady control-plane 18d v1.28.7+vmware.1workload-<cluster_name>-md-0-c2wkk-cm5j9 NotReady <none> 18d v1.28.7+vmware.1
Note: the nodes' status is NotReady because kubelet is not able to communicate with the kube-apiserver, as it's trying to use the cluster VIP.
If using kubectl inside a Control Plane node SSH; edit /etc/kubernetes/admin.conf and substitute <>.9 or its corresponding FQDN with the existing vNIC IP, <>.24.
Example:
$ cp -p /etc/kubernetes/admin.conf ./admin.conf_bkp$ vim /etc/kubernetes/admin.conf- cluster: certificate-authority-data: <> server: https://<>.24:6443$ kubectl get node --kubeconfig /etc/kubernetes/admin.confNAME STATUS ROLES AGE VERSIONworkload-<>-control-plane-lmn8b NotReady control-plane 18d v1.28.7+vmware.1workload-<>-md-0-c2wkk-cm5j9 NotReady <none> 18d v1.28.7+vmware.1
Note: the nodes' status is NotReady because kubelet is not able to communicate with the kube-apiserver, as it's trying to use the cluster VIP.