/etc/kubernetes/manifests/kube-vip.yaml
inside all Control Plane nodes.Unable to connect to the server
" errors as below:$ kubectl get node
Unable to connect to the server: dial tcp <VIP_ADDRESS>:6443: i/o timeout
or
$ kubectl get node
Unable to connect to the server: dial tcp <VIP_ADDRESS>:6443: connect: no route to host
projects.registry.vmware.com
location. Image pull failures will report the following in kubelet logs and when describing the pod:err: "failed to \"StartContainer\" for \"kube-vip\" with ImagePullBackOff: \"Back-off pulling image \\\"projects.registry.vmware.com/tkg/kube-vip:v0.5.12_vmware.1\\\"\"
There can be multiple reasons why kube-vip pods are not running in the Control Plane nodes.
This condition is most commonly encountered in air-gapped environments and appears because the /etc/kubernetes/manifests/kube-vip.yaml
is configured to pull images from the common projects.registry.vmware.com
registry instead of the private registry configured during cluster creation.
For the most commonly encountered kube-vip pod failure condition noted above, editing the imageRegistry
line in the /etc/kubernetes/manifests/kube-vip.yaml
manifest to point to the correct custom image repository should bring up the kube-vip pods.
If modifying the /etc/kubernetes/manifests/kube-vip.yaml doesn't correct the image registry ImagePullBackOff errors, it's possible the image will need to be tagged with the default image repository to allow image pull. Use the following commands to tag the existing image with a custom image repo:
$ ctr -n=k8s.io image ls | grep vip
<CUSTOM_REGISTRY_URL>.fqdn.com/repository/tkg/kube-vip:v0.5.12_vmware.1
$ ctr -n=k8s.io image tag <CUSTOM_REGISTRY_URL>.fqdn.com/repository/tkg/kube-vip:v0.5.12_vmware.1 projects.registry.vmware.com/tkg/kube-vip:v0.5.12_vmware.1
If access to kubectl commands is required urgently, please reference the Additional Information below for steps to recover accessibility to kubectl commands while investigating kube-vip functionality.
In order to be able to run kubectl commands even if the cluster VIP is not assigned, you can update the kubeconfig
file's clusters.cluster.server
field with one of the existing eth0 vNIC IP addresses assigned to a Control Plane node.
For example, in the case below the Control Plane node has two IPv4 addresses assigned to eth0, <>.24
and <>.9
.
root@workload-<cluster_name>-control-plane-lmn8b [ ~ ]# ip a s
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc prio state UP group default qlen 1000
inet <>.24/26 metric 1024 brd <>.63 scope global dynamic eth0
inet <>.9/32 scope global eth0
<>.24
corresponds to the vSphere VM's vNIC and <>.9
is the cluster VIP address.
If kube-vip pods go down, the <>.9
IP address disappears
root@workload-<cluster_name>-control-plane-lmn8b [ ~ ]# ip a s
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc prio state UP group default qlen 1000
inet <>.24/26 metric 1024 brd <>.63 scope global dynamic eth0
At this stage, running kubectl commands will return:
$ kubectl get node
Unable to connect to the server: dial tcp <>.9:6443: i/o timeout
or
$ kubectl get node
Unable to connect to the server: dial tcp <>.9:6443: connect: no route to host
First; check that the ETCD cluster status is healthy.
Inside a Control Plane node, execute:
$ sudo -i
# alias etcdctl="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/*/fs/usr/local/bin/etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt"
# etcdctl member list -w table
# etcdctl endpoint health status --cluster=true -w table
Example of a healthy 3 Control Plane nodes cluster:
# etcdctl member list -w table
+------------------+---------+--------------------------------------+------------------------------------------+------------------------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+--------------------------------------+------------------------------------------+------------------------------------------+------------+
| 17f206fd866fdab2 | started | d5e989cf-2242-44b2-bca1-d922d1627543 | https:
//master-0
.etcd.cfcr.internal:2380 | https:
//master-0
.etcd.cfcr.internal:2379 |
false
|
| 1958063b94f7906b | started | e9753a70-ba7f-43f6-b3e1-b0030290a977 | https:
//master-1
.etcd.cfcr.internal:2380 | https:
//master-1
.etcd.cfcr.internal:2379 |
false
|
| 96d74f332197fd97 | started | 6e664768-fbf6-424a-b808-b4b7bb3c7a12 | https:
//master-2
.etcd.cfcr.internal:2380 | https:
//master-2
.etcd.cfcr.internal:2379 |
false
|
+------------------+---------+--------------------------------------+------------------------------------------+------------------------------------------+------------+
# etcdctl endpoint health status --cluster=true -w table
+----------------------------+--------+-------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+----------------------------+--------+-------------+-------+
| https:
//10
.xxx.xx.xxx:2379 |
true
| 15.725849ms | |
| https:
//10
.xxx.xx.xxx
:2379 |
true
| 17.235013ms | |
| https:
//10
.xxx.xx.xxx
:2379 |
true
| 18.253567ms | |
+----------------------------+--------+-------------+-------+
If using kubectl from an external client/jumpbox; edit the $HOME/.kube/config
and substitute <>.9
or its corresponding FQDN with the existing vNIC IP, <>.24
.
Example:
$ cp -p $HOME/.kube/config ./config_bkp
$ vim $HOME/.kube/config
- cluster:
certificate-authority-data: <>
server: https://<>.24:6443
$ kubectl get node
NAME STATUS ROLES AGE VERSION
workload-<cluster_name>-control-plane-lmn8b NotReady control-plane 18d v1.28.7+vmware.1
workload-<cluster_name>-md-0-c2wkk-cm5j9 NotReady <none> 18d v1.28.7+vmware.1
Note: the nodes' status is NotReady because kubelet is not able to communicate with the kube-apiserver, as it's trying to use the cluster VIP.
If using kubectl inside a Control Plane node SSH; edit /etc/kubernetes/admin.conf
and substitute <>.9
or its corresponding FQDN with the existing vNIC IP, <>.24
.
Example:
$ cp -p /etc/kubernetes/admin.conf ./admin.conf_bkp
$ vim /etc/kubernetes/admin.conf
- cluster:
certificate-authority-data: <>
server: https://<>.24:6443
$ kubectl get node --kubeconfig
/etc/kubernetes/admin.conf
NAME STATUS ROLES AGE VERSION
workload-<>-control-plane-lmn8b NotReady control-plane 18d v1.28.7+vmware.1
workload-<>-md-0-c2wkk-cm5j9 NotReady <none> 18d v1.28.7+vmware.1
Note: the nodes' status is NotReady because kubelet is not able to communicate with the kube-apiserver, as it's trying to use the cluster VIP.