Remediating a node's expired user cert for a k8s cluster deployed by vCloud Director Container Service Extension.
search cancel

Remediating a node's expired user cert for a k8s cluster deployed by vCloud Director Container Service Extension.

book

Article ID: 423083

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

In a TKG cluster, pods within the kube-system namespace (or others) are stuck in a Terminating status on a single control plane node. The API server does not receive updates regarding the pod's final termination, preventing the cluster from reconciling the desired state.

on the node in question, kublet services do not start

Kubelet logs show:

Dec 19 06:17:13 {{controlplane-node-x}} kubelet[99831]: I1219 06:17:13.481580 99831 server.go:837] "Client rotation is on, will bootstrap in background"
Dec 19 06:17:13 {{controlplane-node-x}} kubelet[99831]: E1219 06:17:13.483794 99831 bootstrap.go:266] part of the existing bootstrap client certificate in /etc/kubernetes/kubelet.conf is expired: 2025-11-18 17:52:03
Dec 19 06:17:13 {{controlplane-node-x}} kubelet[99831]: E1219 06:17:13.483895 99831 run.go:74] "command failed" err="failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet>
Dec 19 06:17:13 {{controlplane-node-x}} systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE

kubeadm certs check-expiration Does not show any expired certs:

kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W1219 06:54:31.131877 105049 configset.go:177] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeproxy.config.k8s.io", Version: "meout"

CERTIFICATE                EXPIRES               RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Dec 18, 2026 18:03 UTC   364d            ca                      no
apiserver                  Dec 18, 2026 18:03 UTC   364d            ca                      no
apiserver-etcd-client      Dec 18, 2026 18:03 UTC   364d            etcd-ca                 no
apiserver-kubelet-client   Dec 18, 2026 18:03 UTC   364d            ca                      no
controller-manager.conf    Dec 18, 2026 18:03 UTC   364d            ca                      no
etcd-healthcheck-client    Dec 18, 2026 18:03 UTC   364d            etcd-ca                 no
etcd-peer                  Dec 18, 2026 18:03 UTC   364d            etcd-ca                 no
etcd-server                Dec 18, 2026 18:03 UTC   364d            etcd-ca                 no
front-proxy-client         Dec 18, 2026 18:03 UTC   364d            front-proxy-ca          no
scheduler.conf             Dec 18, 2026 18:04 UTC   364d            ca                      no

CERTIFICATE AUTHORITY   EXPIRES               RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Jul 21, 2034 12:21 UTC   8y              no
etcd-ca                 Jul 21, 2034 12:21 UTC   8y              no
front-proxy-ca          Jul 21, 2034 12:21 UTC   8y              no

Environment

Cloud Director 10.6.x

Container Service Extension 

Cause

Kubelet's client certificate had already expired.
This Certificate is used to authenticate with the Kubernetes API server

cat /etc/kubernetes/kubelet.conf
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data:(truncated, long base64 string)
    server: https://{{node_ip}}:6443
  name: default-cluster
contexts:
- context:
    cluster: default-cluster
    namespace: default
    user: default-auth
  name: default-context
current-context: default-context
kind: Config
preferences: {}
users:
- name: default-auth
  user:
    client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem   <----expired certificate, 2025-11-18 17:52:03
    client-key: /var/lib/kubelet/pki/kubelet-client-current.pem

Resolution

The most reliable recovery path is to trigger a Partial Bootstrap. This allows Kubernetes to regenerate the kubelet.conf with the correct identity and certificates automatically.

Step 1: Generate a New Bootstrap Token

On a functional control plane node, run the following command to create a temporary join token:

kubeadm token create --print-join-command

Note the output includes the IP, Token, and CA Cert Hash, do not modify. Copy, paste as is at step3

Step 2: Prepare the Affected Node

Log into the broken node and stop the Kubelet service to clear the path for reconfiguration:

# Stop the service
systemctl stop kubelet

# Backup the expired configuration and certificates
mkdir ~/old_certs
mv /etc/kubernetes/kubelet.conf ~/old_certs/
mv /var/lib/kubelet/pki/kubelet-client* ~/old_certs/
mv /etc/kubernetes/pki/ca.crt ~/old_certs/
mv /etc/kubernetes/pki/ca.key ~/old_certs/

Step 3: Perform a Partial Rejoin

Execute the join command generated in Step 1 on the broken node.

Example Syntax

kubeadm join <API_SERVER_IP>:6443 --token <TOKEN> --discovery-token-ca-cert-hash sha256:<HASH>

Warning: Do not run kubeadm reset.

Note: Running kubeadm join on a node that is already provisioned will detect the existing state and only regenerate the necessary kubelet.conf and bootstrap credentials.

Step 4: Verify Restoration

  1. Service Status: Confirm Kubelet is running: systemctl status kubelet.

  2. Pod Reconciliation: Observe the nodes, pods in the cluster. Once the Kubelet checks in, the API server will acknowledge the terminations, and the pods will disappear or restart as expected:

    Kubectl get nodes
    kubectl get pods -n kube-system

Additional Information

If "kubeadm certs check-expiration" returns expired certs, then Follow KB 397680