The KB article is designed for troubleshooting vSphere Workload Cluster VIP cannot be reached from within the cluster's control plane nodes.
vSphere Workload Clusters are also known as Guest Clusters.
While connected to the Supervisor cluster context, the one or more of the following symptoms may be present:
failed to create etcd client: could not establish a connection to the etcd leader: [could not establish a connection to any etcd node: unable to create etcd client: context deadline exceeded, failed to connect to etcd node]
failed to create etcd client: could not establish a connection to the etcd leader: [could not establish a connection to any etcd node: unable to create etcd client: context deadline exceeded, failed to connect to etcd node]
Reason: RemediationFailed @ /
kubectl get service -n <cluster namespace> | grep "control-plane"
kubectl get ep -n <cluster namespace>
kubectl get machine,vm -o wide -n <cluster namespace>
kubectl get machine -n <cluster namespace>
kubectl get machine -n <cluster namespace>
kubectl get pods -A | egrep "ncp|ako|lbapi"
While connected to affected vSphere Workload Cluster's context, the following symptoms are present:
The connection to the server localhost:8080 was refused - did you specify the right host or port?
While SSH directly to a new node in Provisioned state, the following symptoms are present:
crictl ps -a
systemctl status containerd
systemctl status kubelet
journalctl -xeu kubelet
"command failed" err="failed to load kubelet config file, path: /var/lib/kubelet/config.yaml, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory"
vSphere 7.0 with Tanzu
vSphere 8.0 with Tanzu
This issue can occur regardless of whether or not this cluster is managed by Tanzu Mission Control (TMC)
When the vSphere Workload cluster's VIP is inaccessible, kubectl commands from within the vSphere Workload cluster will fail.
As a result, the Supervisor cluster will be unable to reach the affected workload cluster's nodes for management and remediation.
This includes the creation and deletion of the affected workload cluster's nodes.
This issue can occur even when the Supervisor cluster is able to reach the workload cluster's VIP.
This KB article will provide steps to troubleshoot VIP connection failures within the affected vSphere Workload Cluster.
See "How to SSH into Supervisor Control Plane VMs" from Troubleshooting vSphere with Tanzu Supervisor Control Plane VMs
kubectl get service -n <cluster namespace> | grep "control-plane"
kubectl get ep -n <cluster namespace>
curl -vk <cluster VIP>:6443
curl -vk <cluster control plane IP>:6443
kubectl get nodes
crictl ps | egrep "etcd|kube-apiserver"
crictl logs <container id>
kubeadm certs check-expiration
df -h
curl -vk <affected workload cluster VIP>:6443
tcpdump src <affected workload cluster VIP> and port 6443
curl -vk <affected workload cluster VIP>:6443