KubeClientCertificateExpiration warning alert shows up on Prometheus dashboard, indicating "A client certificate used to authenticate to kubernetes apiserver is expiring in less than 7.0 days on cluster".
The alert comes from kubernetes-system-apiserver.
TKG clusters with Prometheus Operator.
An expiring client certificate talking to the Kubernetes Apiserver would cause the alert.
Unfortunately, the alert doesn't offer information to identify the expiring client. Below, in the Resolution section, we'll share suggestions to try to identify it.
As described in https://github.com/prometheus-operator/kube-prometheus/issues/881, identifying the expiring client is not an easy task and there may be cases where the client becomes expired before being identified.
Once the client's certificate has expired, kube-apiserver pod logs will show errors as below, pointing to the client:
"Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid"
Suggestions to identify the expiring client include:
kubectl get nodes \
-o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}' \
-l node-role.kubernetes.io/control-plane= > nodes
for i in `cat nodes`; do
printf "\n######\n"
ssh -o "StrictHostKeyChecking=no" -q capv@$i hostname
ssh -o "StrictHostKeyChecking=no" -q capv@$i sudo kubeadm certs check-expiration
done;
# kubectl get nodes \
-o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}' > all_nodes
# for i in `cat all_nodes`; do
printf "\n######\n"
ssh -o "StrictHostKeyChecking=no" -q capv@$i hostname
ssh -o "StrictHostKeyChecking=no" -q capv@$i sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates
done;
# ssh capv@<CP-IP>
# sudo -i
# vi /etc/kubernetes/manifests/kube-apiserver.yaml
"- --v=10"
to the kube-apiserver options.spec:
containers:
- command:
- kube-apiserver
- --v=10
# kubectl logs -n kube-system <kube-apiserver-pod-name> > <kube-apiserver-pod-name>.log
"- --v=10"
option in all ControlPlane nodes./var/log/kubernetes/audit.log
from each of the ControlPlane nodes.# cat /var/log/kubernetes/audit.log | jq .user.username | sort | uniq
# kubectl get secret,sa -A
# kubectl get secret <secret-name> -n <namespace-name> -oyaml
# echo "<base64-encoded-certificate-from-above-output>" | base64 -d | openssl x509 -noout -dates