In a TKG cluster, pods within the kube-system namespace (or others) are stuck in a Terminating status on a single control plane node. The API server does not receive updates regarding the pod's final termination, preventing the cluster from reconciling the desired state.
on the node in question, kublet services do not start
Kubelet logs show:
Dec 19 06:17:13 {{controlplane-node-x}} kubelet[99831]: I1219 06:17:13.481580 99831 server.go:837] "Client rotation is on, will bootstrap in background"
Dec 19 06:17:13 {{controlplane-node-x}} kubelet[99831]: E1219 06:17:13.483794 99831 bootstrap.go:266] part of the existing bootstrap client certificate in /etc/kubernetes/kubelet.conf is expired: 2025-11-18 17:52:03
Dec 19 06:17:13 {{controlplane-node-x}} kubelet[99831]: E1219 06:17:13.483895 99831 run.go:74] "command failed" err="failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet>
Dec 19 06:17:13 {{controlplane-node-x}} systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
kubeadm certs check-expiration Does not show any expired certs:
kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W1219 06:54:31.131877 105049 configset.go:177] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeproxy.config.k8s.io", Version: "meout"
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Dec 18, 2026 18:03 UTC 364d ca no
apiserver Dec 18, 2026 18:03 UTC 364d ca no
apiserver-etcd-client Dec 18, 2026 18:03 UTC 364d etcd-ca no
apiserver-kubelet-client Dec 18, 2026 18:03 UTC 364d ca no
controller-manager.conf Dec 18, 2026 18:03 UTC 364d ca no
etcd-healthcheck-client Dec 18, 2026 18:03 UTC 364d etcd-ca no
etcd-peer Dec 18, 2026 18:03 UTC 364d etcd-ca no
etcd-server Dec 18, 2026 18:03 UTC 364d etcd-ca no
front-proxy-client Dec 18, 2026 18:03 UTC 364d front-proxy-ca no
scheduler.conf Dec 18, 2026 18:04 UTC 364d ca no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Jul 21, 2034 12:21 UTC 8y no
etcd-ca Jul 21, 2034 12:21 UTC 8y no
front-proxy-ca Jul 21, 2034 12:21 UTC 8y no
Cloud Director 10.6.x
Container Service Extension
Kubelet's client certificate had already expired.
This Certificate is used to authenticate with the Kubernetes API server
cat /etc/kubernetes/kubelet.conf
apiVersion: v1
clusters:
- cluster:
certificate-authority-data:(truncated, long base64 string)
server: https://{{node_ip}}:6443
name: default-cluster
contexts:
- context:
cluster: default-cluster
namespace: default
user: default-auth
name: default-context
current-context: default-context
kind: Config
preferences: {}
users:
- name: default-auth
user:
client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem <----expired certificate, 2025-11-18 17:52:03
client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
The most reliable recovery path is to trigger a Partial Bootstrap. This allows Kubernetes to regenerate the kubelet.conf with the correct identity and certificates automatically.
On a functional control plane node, run the following command to create a temporary join token:
kubeadm token create --print-join-command
Note the output includes the IP, Token, and CA Cert Hash, do not modify. Copy, paste as is at step3
Log into the broken node and stop the Kubelet service to clear the path for reconfiguration:
# Stop the service
systemctl stop kubelet
# Backup the expired configuration and certificates
mkdir ~/old_certs
mv /etc/kubernetes/kubelet.conf ~/old_certs/
mv /var/lib/kubelet/pki/kubelet-client* ~/old_certs/
mv /etc/kubernetes/pki/ca.crt ~/old_certs/
mv /etc/kubernetes/pki/ca.key ~/old_certs/
Execute the join command generated in Step 1 on the broken node.
Example Syntax
kubeadm join <API_SERVER_IP>:6443 --token <TOKEN> --discovery-token-ca-cert-hash sha256:<HASH>
Warning: Do not run kubeadm reset.
Note: Running kubeadm join on a node that is already provisioned will detect the existing state and only regenerate the necessary kubelet.conf and bootstrap credentials.
Service Status: Confirm Kubelet is running: systemctl status kubelet.
Pod Reconciliation: Observe the nodes, pods in the cluster. Once the Kubelet checks in, the API server will acknowledge the terminations, and the pods will disappear or restart as expected:
Kubectl get nodeskubectl get pods -n kube-system
If "kubeadm certs check-expiration" returns expired certs, then Follow KB 397680