certmgr script fails to rotate etcd-server certificates on guest cluster control plane nodes
search cancel

certmgr script fails to rotate etcd-server certificates on guest cluster control plane nodes

book

Article ID: 431879

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

After executing the automated certmgr certificate rotation script on a vSphere Supervisor guest cluster, the etcd-server certificates (/etc/kubernetes/pki/etcd/server.crt) are not updated. The certificates retain their original expiration dates across all control plane nodes, despite the script completing successfully. Other cluster certificates may rotate as expected, isolating the failure to the etcd server certificates.

List control plane certificates for a guest cluster.

certmgr tkc certificates list my-cluster -n my-namespace

etcd-server certificate shows as expiring or expired:

/etc/kubernetes/pki/etcd/server.crt              | 2026-03-05 20:36:12 +0000 UTC | false     |
|              |             |                                   | 

Cause

The automated certmgr script contains a defect that causes it to skip or fail to target the etcd-server certificates during its automated rotation loop.

Resolution

To resolve the issue, manually rotate the etcd-server certificates using kubeadm and restart the static pods. Perform these steps sequentially on one control plane node at a time.

  1. SSH into the first control plane node and escalate privileges to root.
  2. Verify the current etcd cluster health and quorum:
    1. crictl ps | grep etcd
    2. alias etcdctl='crictl exec <etcd container id from step 1> etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt' 
    3. Query the total members expected in the quorum by IP address
      • etcdctl member list -w table
    4. Check the health of all members in the quorum
      • etcdctl --cluster=true endpoint health -w table
    5. Check the status of all members in the quorum and find which is the leader
      • etcdctl --cluster=true endpoint status -w table
  3. Check the current etcd certificate expiration dates:
    • kubeadm certs check-expiration | grep etcd
  4. Manually force the renewal of the etcd-server certificate:
    • kubeadm certs renew etcd-server
  5. Recheck the expiration dates to confirm the etcd-server certificate has been extended (typically by one year):
    • kubeadm certs check-expiration | grep etcd
  6. Stop the etcd static pod. The kubelet will automatically spawn a new container instance, forcing it to load the newly generated certificate from the filesystem into memory:
    • crictl stop $(crictl ps --name etcd -q)
  7. Wait for the etcd container to restart, then verify the cluster health and quorum is fully restored: 
    1. crictl ps | grep etcd
    2. alias etcdctl='crictl exec <etcd container id from step 1> etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt' 
    3. Query the total members expected in the quorum by IP address
      • etcdctl member list -w table
    4. Check the health of all members in the quorum
      • etcdctl --cluster=true endpoint health -w table
    5. Check the status of all members in the quorum and find which is the leader
      • etcdctl --cluster=true endpoint status -w table
  8. Proceed to the next control plane node and repeat steps 1-7. Do not proceed to the next node until the current node reports a healthy etcd endpoint.