How to rotate certificates in a Tanzu Kubernetes Grid cluster
search cancel

How to rotate certificates in a Tanzu Kubernetes Grid cluster

book

Article ID: 342295

calendar_today

Updated On:

Products

Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid 1.x

Issue/Introduction

Summary of rotation process:
  • Rotate Kubernetes component certs on Control Plane nodes
  • Update kubeconfig for Management cluster
  • Update kubeconfig for Workload cluster
  • Rotate kubelet certs on Control Plane nodes
  • Rotate kubelet certs on Worker nodes


Symptoms:
This article  describe the process to rotate the Kubernetes core components including kubelet for both Management and Workload Tanzu Kubernetes Grid (TKG) clusters.

It has been validated on TKG 1.1, 1.2, 1.3, 1.4, 1.5 & 2.x 

This procedure is NOT supported on vSphere with Tanzu. If you are using vSphere with Tanzu you need to rotate certificates, please open a case with VMware Support in order to rotate these. The docs for rotating these are still in progress. 

If the certs have already expired, then kube-apiserver returns the following error below when running kubectl commands.
# kubectl get nodes
Unable to connect to the server: x509: certificate has expired or is not yet valid

Note: If you are using vSphere with Tanzu (vSphere IaaS Control Plane) / TKGS please use the following KB to rotate guest cluster certificates instead of this one. 
Reference this KB

Environment

VMware Tanzu Kubernetes Grid 1.x
VMware Tanzu Kubernetes Grid Plus 1.x
VMware Tanzu Kubernetes Grid 2.x
VMware Tanzu Kubernetes Grid Plus 2.x

Cause

Kubernetes component certificates have a 1 year duration and are rotated during cluster update. However, if the cluster has not been upgraded then the certs will need to be rotated manually.

To check expiration date on kubernetes component certificates:
ssh capv@CONTROL-PLANE-IP
sudo -i
kubeadm alpha certs check-expiration

Note: For TKGm 1.5.x , you can remove the "alpha" from the command above

Kubelet certs are automatically rotated when the current certificate approaches the expiration date. However, if this process has not completed, they can be manually rotated.

Check the expiration date on kubelet certificate:
ssh capv@CONTROL-PLANE-IP
sudo -i
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem  -noout -dates

Resolution

Rotate Kubernetes component certs on Control Plane nodes


1. SSH to Control Plane node and rotate cluster component certs:

ssh capv@CONTROL-PLANE-IP
sudo -i
kubeadm alpha certs check-expiration
kubeadm alpha certs renew all -v 6
kubeadm alpha certs check-expiration

Note: For TKGm 1.5.x , you can remove the "alpha" from the 3 commands above


2. Restart cluster components, etcd, kube-apiserver, kube-controller-manager, kube-scheduler and kube-vip if present:

crictl ps
ps -fe | grep -e etcd -e kube-api -e kube-controller-manager -e kube-scheduler -e kube-vip
kill PID
crictl ps


3. Repeat the above steps on all remaining Control Plane nodes.


4. Verify local kubeconfig has been updated with new certs:
ssh capv@CONTROL-PLANE-IP
sudo -i
grep client-certificate-data /etc/kubernetes/admin.conf | awk '{print $2}' | base64 -d | openssl x509 -noout -dates
export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl get nodes
 

 

Retrieve kubeconfig for Management Cluster


1. SSH to Control Plane node of Management cluster and retrieve client-certificate-data and client-key-data values from the /etc/kubernetes/admin.conf:
ssh capv@CONTROl-PLANE-IP
vi /etc/kubernetes/admin.conf

2. Update local kube config files on jumpbox, change the client-certificate-data and client-key-data values for Management cluster admin user.
​​​vi ​​​​~/.kube/config

users:
- name: MGMT-CLUSTER-admin
  user:
    client-certificate-data: YYYYYY
    client-key-data: ZZZZZZ

 
​​​vi ​​​​~/.kube-tkg/config

users:
- name: MGMT-CLUSTER-admin
  user:
    client-certificate-data: YYYYYY
    client-key-data: ZZZZZZ


3. Verify update kube config and target the Management cluster context:
kubectl get nodes

4. Obtain the Management cluster kubeconfig secret:
kubectl get secret -n tkg-system MGMT-CLUSTER-kubeconfig -o jsonpath='{.data.value}' | base64 -d > mgmt-kubeconfig-value

5. Update the client-certificate-data and client-key-data using values from /etc/kubernetes/admin.conf:
vi mgmt-kubeconfig-value
base64 mgmt-kubeconfig-value -w 0

6. Update the Management cluster kubeconfig secret, change "data.value" with encoded data from previous command:
kubectl edit secret -n tkg-system MGMT-CLUSTER-kubeconfig

7. Retrieve the kubeconfig using the tkg CLI:
tkg get management-cluster
kubectl config use-context MGMT-CONTEXT
kubectl get nodes
Note: For TKGm 1.4.x and higher versions , replace "tkg" cli with "tanzu" cli and use "tanzu mc get" . 
 
 

Retrieve kubeconfig for Workload Cluster


1. Target the Management cluster and delete the kubeconfig secrets for the workload cluster.  Note that deleting and recreating this secret is an important step because if it is not done then eventually the CAPI pods won't be able to communicate with the clusters.
kubectl config use-context MGMT-CONTEXT
kubectl get secrets -A | grep kubeconfig
kubectl delete secret CLUSTER-NAME-kubeconfig -n NAMESPACE 

Note: If cluster reconciliation  is paused then the secret will not get recreated. In order for the secret to get successfully recreated make sure the cluster reconciliation is not paused.
 
To verify cluster reconciliation run the below command from the management cluster context. Verify the paused should not be true.
 
kubectl get cluster <cluster_name> -namespace <name-namespace> -o yaml | grep -i paused 



2. A new secret will be recreated after a few minutes and kubeconfig can be retrieved using tkg cli
kubectl get secrets -A | grep kubeconfig
tkg get credentials CLUSTER-NAME
kubectl config use-context WORKLOAD-CONTEXT
kubectl get nodes
 
Note: For TKGm 1.4.x and higher versions, replace "tkg" cli with "tanzu" cli and use "tanzu cluster kubeconfig get <cluster-name> -n <namespace-name> --admin" to retrieve the kubeconfig of a cluster. 
 
 

Rotate kubelet certs on Control Plane nodes


1. The kubelet cert should be automatically rotated, check expiration date before executing these steps.
 
To check the expiration date:
# openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates

If expired, SSH to Control Plane node and backup kubelet config files
ssh capv@CONTROL-PLANE-IP
sudo -i
mkdir /home/capv/backup
mv /etc/kubernetes/kubelet.conf /home/capv/backup
mv /var/lib/kubelet/pki/kubelet-client* /home/capv/backup

2. Generate kubelet.conf and update cluster name and server endpoint.
NODE can be retrieved from "kubectl get nodes" or existing kubelet.conf
# check kubeadm version
kubeadm version

# if kubeadm version is v1.19.* or lower,
kubeadm alpha kubeconfig user --org system:nodes --client-name system:node:NODE > /home/capv/backup/kubelet-NODE.conf

# if kubeadm version is v1.20.* or v1.21.*,
kubeadm config --kubeconfig /etc/kubernetes/admin.conf view > kubeadm.config
kubeadm alpha kubeconfig user --org system:nodes --client-name system:node:NODE > /home/capv/backup/kubelet-NODE.conf --config kubeadm.config

# if kubeadm version is v1.22.*,
kubectl get cm -n kube-system kubeadm-config -o=jsonpath="{.data.ClusterConfiguration}" --kubeconfig /etc/kubernetes/admin.conf > kubeadm.config
kubeadm kubeconfig user --org system:nodes --client-name system:node:NODE > /home/capv/backup/kubelet-NODE.conf --config kubeadm.config


vi /home/capv/backup/kubelet-NODE.conf

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: XXXXXX
    server: https://ENDPOINT-IP:6443
  name: CLUSTER-NAME
contexts:
- context:
    cluster: CLUSTER-NAME 

Note: kubeadm version v1.20+ requires flag "--config" for the command "kubeadm alpha kubeconfig user".
 
Related commit: 
Github Link

If "--config" is not provided, we will hit error:
    required flag(s) "config" not set



3. Copy kubelet-NODE.conf to /etc/kubernetes, restart kubelet and wait for kubelet-client-current.pem to be recreated.
cp home/capv/backup/kubelet-NODE.conf /etc/kubernetes/kubelet.conf
systemctl restart kubelet
systemctl status kubelet
ls -l /var/lib/kubelet/pki/

4. Force kubelet.conf to use new kubelet-client-current.pem
kubeadm init phase kubelet-finalize all
ls -l /var/lib/kubelet/pki/
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem  -noout -dates

5. Verify the node is Ready
kubectl get nodes

6. Repeat the above steps on other Control Plane nodes.
 

Rotate kubelet certs on Worker nodes


1. The kubelet cert should be automatically rotated, check expiration date before executing these steps.
 
To check the expiration date:
# openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates

If expired, SSH to worker node and backup kubelet config files:
ssh capv@WORKER-IP
sudo -i
mkdir /home/capv/backup
mv /etc/kubernetes/kubelet.conf /home/capv/backup
mv /var/lib/kubelet/pki/kubelet-client* /home/capv/backup

2. SSH to Control Plane and generate kubelet-NODE.conf for each worker node and update cluster name and server endpoint.
NODE can be retrieved from "kubectl get nodes" or existing kubelet.conf:
ssh capv@CONTROl-PLANE-IP

# check kubeadm version
kubeadm version

# if kubeadm version is v1.19.* or lower,
kubeadm alpha kubeconfig user --org system:nodes --client-name system:node:NODE > /home/capv/backup/kubelet-NODE.conf

# if kubeadm version is v1.20.* or v1.21.*,
kubeadm config --kubeconfig /etc/kubernetes/admin.conf view > kubeadm.config
kubeadm alpha kubeconfig user --org system:nodes --client-name system:node:NODE > /home/capv/backup/kubelet-NODE.conf --config kubeadm.config

# if kubeadm version is v1.22.*,
kubectl get cm -n kube-system kubeadm-config -o=jsonpath="{.data.ClusterConfiguration}" --kubeconfig /etc/kubernetes/admin.conf > kubeadm.config
kubeadm kubeconfig user --org system:nodes --client-name system:node:NODE > /home/capv/backup/kubelet-NODE.conf --config kubeadm.config

vi /home/capv/backup/kubelet-NODE.conf

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: XXXXXX
    server: https://ENDPOINT-IP:6443
  name: CLUSTER-NAME
contexts:
- context:
    cluster: CLUSTER-NAME 


3. Copy kubelet-NODE.conf to the corresponding worker node:
scp capv@CONTROL-PLANE-IP:/home/capv/backup/kubelet-NODE.conf .
scp kubelet-<NODE>.conf capv@WORKER-IP:/home/capv/backup/kubelet-NODE.conf
 
4. SSH to worker node, copy kubelet and restart kubelet:
cp /home/capv/backup/kubelet-<NODE>.conf /etc/kubernetes/kubelet.conf
systemctl restart kubelet
systemctl status kubelet

5. Update kubelet.conf to use pem file rather than raw data:
vi /etc/kubernetes/kubelet.conf
:
  user:
    client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
    client-key: /var/lib/kubelet/pki/kubelet-client-current.pem

systemctl restart kubelet
systemctl status kubelet

ls -l /var/lib/kubelet/pki/
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-<DATE>.pem -text -noout | grep -A2 Validity

6. Verify the node is ready:
kubectl get nodes
​​​​
7. Repeat the same process on remaining worker nodes.