The procedure outlines the steps that you can use to rotate the TKGI certificates of the secrets event-controller, fluent-bit, validator and their CA pks-ca .The procedure in this article can be used for TKGI version 1.11.x and above.
A fix to rotate the certs of event-controller/ fluent-bit/ validator/pks-ca will be done automatically when the customers upgrade the TKGI tile to 1.16.1 or 1.15.5 or 1.14.7 then upgrade the TKGI clusters afterwards
https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid-Integrated-Edition/1.14/tkgi/GUID-release-notes.html#1-14-0-secrets-not-rotated
1- Please take a backup first of the secrets that you will rotate (event-controller/ fluent-bit/ validator/pks-ca) using the below commands
kubectl get secrets -n pks-system event-controller > event-controller.yaml cp event-controller.yaml event-controller-backup.yaml kubectl get secrets -n pks-system fluent-bit > fluent-bit.yaml cp fluent-bit.yaml fluent-bit-backup.yaml kubectl get secrets -n pks-system validator > validator.yaml cp validator.yaml validator-backup.yaml
2- To rotate the event-controller/ fluent-bit/ validator certs , please delete the event-controller, fluent-bit and validator secrets. If you want to rotate pks-ca kindly delete the secret pks-ca as well.
3- Apply the cert-generator job to generate new ca and certs.
To apply the job please take a backup of the job as yaml , delete the job and then apply the backup yaml .
Note : You will need to edit the cert-generator backup YAML and remove the 2 lines selector.matchLabels.controller-uid and spec.template.metadata.labels.controller-uid to get to apply the job successfully using the below kubectl command
kubectl apply -f cert-generator-backup.yaml
4- Restart event-controller/ fluent-bit/ validator and validate that their certs are showing a new expiration date by running the below commands.
$ kubectl get secrets event-controller -o json -n pks-system | jq -r '.data."tls.crt"' | base64 -d | openssl x509 -text | grep 'Before\|After' Not Before: Mar 25 09:45:00 2020 GMT Not After : Mar 25 09:45:00 2023 GMT $ kubectl get secrets pks-ca -o json -n pks-system | jq -r '.data."tls.crt"' | base64 -d | openssl x509 -text | grep 'Before\|After' Not Before: Mar 25 09:45:00 2020 GMT Not After : Mar 24 09:45:00 2025 GMT $ kubectl get secrets fluent-bit -o json -n pks-system | jq -r '.data."tls.crt"' | base64 -d | openssl x509 -text | grep 'Before\|After' Not Before: Mar 25 09:45:00 2020 GMT Not After : Mar 25 09:45:00 2023 GMT $ kubectl get secrets validator -o json -n pks-system | jq -r '.data."tls.crt"' | base64 -d | openssl x509 -text | grep 'Before\|After' Not Before: Mar 25 09:45:00 2020 GMT Not After : Mar 25 09:45:00 2023 GMT
If the cert of validator is invalid, a new applied logsink/clusterLogsink/metricsink/clusterMetricsink after the expiration date will be failed with error message:x509: certificate has expired or is not yet valid
e.g.
ubuntu@opsmanager-2-10:~$ k apply -f logsink.yaml Error from server (InternalError): error when creating "logsink.yaml": Internal error occurred: failed calling webhook "log.validator.pksapi.io": failed to call webhook: Post "https://validator.pks-system.svc:443/logsink?timeout=10s": x509: certificate has expired or is not yet valid: current time 2023-04-04T06:08:45Z is after 2023-04-04T04:58:00Z