One of the three Supervisor Cluster control plane nodes is in a NotReady state, causing etcd to lose quorum. Containers on the affected node are in an exited state, and few pods are in a terminating state. The cluster is unreachable on port 6443, with functionality ceasing approximately two days prior to the report, and no known environmental changes. While ping and tracert to the IP address (e.g., 10.x.x.x) are successful, connection to port 6443 fails, as evidenced by curl command output similar to:
curl --insecure https://10.x.x.x:6443/healthz
>>curl: (28) Failed to connect to 10.x.x.x port 6443 after 21051 ms: Could not connect to server
Errors observed in the journalctl -xeu kubelet logs on the affected node include:
E1008 <> bootstrap.go:266] part of the existing bootstrap client certificate in /etc/kubernetes/kubelet.conf is expired: 2025-09-25 21:20:02 +0000 UTCE1008 <> run.go:74] "command failed" err="failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory"kubelet.service: Main process exited, code=exited, status=1/FAILUREkubelet.service: Failed with result 'exit-code'.Additionally, the output of kubectl get nodes may show a node in a NotReady state:
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
test-1 Ready control-plane,master 571d v1.25.6+vmware.wcp.2
test-2 NotReady control-plane,master 571d v1.25.6+vmware.wcp.2
test-3 Ready control-plane,master 571d v1.25.6+vmware.wcp.2
The underlying cause of the issue is an expired kubelet client certificate, which is hard-coded into the /etc/kubernetes/kubelet.conf file on the affected control plane node. This prevents the kubelet service from starting and establishing a connection to the API server on port 6443. Other control plane nodes are correctly referencing rotated and valid certificates, indicating an inconsistency in the configuration of the affected node.
/etc/kubernetes/kubelet.conf file from a healthy control plane node (where certificates are correctly referenced as /var/lib/kubelet/pki/kubelet-client-current.pem) to the affected node. This will replace the hard-coded expired certificates with the correct configuration.systemctl restart kubelet.servicesystemctl status kubelet.servicekubectl get pods -A -o widecrictl ps -akubectl get nodes -o wideRelated Article: