KubeClientCertificateExpiration firing alert on Prometheus dashboard
search cancel

KubeClientCertificateExpiration firing alert on Prometheus dashboard

book

Article ID: 385732

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid

Issue/Introduction

KubeClientCertificateExpiration warning alert shows up on Prometheus dashboard, indicating "A client certificate used to authenticate to kubernetes apiserver is expiring in less than 7.0 days on cluster".

The alert comes from kubernetes-system-apiserver.

Environment

TKG clusters with Prometheus Operator.

Cause

An expiring client certificate talking to the Kubernetes Apiserver would cause the alert.

Unfortunately, the alert doesn't offer information to identify the expiring client. Below, in the Resolution section, we'll share suggestions to try to identify it.

As described in https://github.com/prometheus-operator/kube-prometheus/issues/881, identifying the expiring client is not an easy task and there may be cases where the client becomes expired before being identified.

Once the client's certificate has expired, kube-apiserver pod logs will show errors as below, pointing to the client:

"Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid"

Resolution

Suggestions to identify the expiring client include:

  1. Check Kubernetes components' certificates expiration dates in all ControlPlane nodes.

    References:
    -) KB How to rotate certificates in a Tanzu Kubernetes Grid cluster
    -) Official Docs Renew Cluster Certificates (Standalone MC)

    kubectl get nodes \
    -o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}' \
    -l node-role.kubernetes.io/control-plane= > nodes

    for i in `cat nodes`; do
      printf "\n######\n"
      ssh -o "StrictHostKeyChecking=no" -q capv@$i hostname
      ssh -o "StrictHostKeyChecking=no" -q capv@$i sudo kubeadm certs check-expiration
    done;

  2. Check kubelet client certificate's expiration dates in all nodes (Workers and ControlPlanes).

    References:
    -) KB How to rotate certificates in a Tanzu Kubernetes Grid cluster

    # kubectl get nodes \
    -o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}' > all_nodes

    # for i in `cat all_nodes`; do
      printf "\n######\n"
      ssh -o "StrictHostKeyChecking=no" -q capv@$i hostname
      ssh -o "StrictHostKeyChecking=no" -q capv@$i sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem  -noout -dates
    done;

  3. Increase kube-apiserver logging verbosity and check logs.

    References:
    -) https://github.com/prometheus-operator/kube-prometheus/issues/881#issuecomment-452356415

    1. Log into a ControlPlane node:
      # ssh capv@<CP-IP>
      # sudo -i
    2. Edit the kube-apiserver manifest adding verbosity:
      # vi /etc/kubernetes/manifests/kube-apiserver.yaml

      Add: "- --v=10" to the kube-apiserver options.

      For example:

      spec:
        containers:
        - command:
          - kube-apiserver
          - --v=10
    3. Wait until the kube-apiserver pod is recreated.
    4. If there's more than one ControlPlane node, repeat the above steps for the other two ControlPlane nodes.
    5. Wait for a little while and then collect logs from all the kube-apiserver pods:
      # kubectl logs -n kube-system <kube-apiserver-pod-name> > <kube-apiserver-pod-name>.log
    6. Revert the changes and remove the "- --v=10" option in all ControlPlane nodes.

  4. Check kube-apiserver audit logs.

    Collect and analyze /var/log/kubernetes/audit.log from each of the ControlPlane nodes.
    While this file won't likely tell you which client's certificate is expiring, it'll allow you to list up all the clients talking to your Kubernetes Apiserver. You can then proceed and check each of the client's certificates individually.

    # cat /var/log/kubernetes/audit.log | jq .user.username | sort | uniq

  5. Check Secrets and ServiceAccounts.

    Some clients interact with the Kubernetes Apiserver through ServiceAccounts. Listing up the Secrets and ServiceAccounts in your cluster may give you a hint on the expiring client. You can examine individually each of the Secrets/certificates configured for the ServiceAccounts in the cluster.

    # kubectl get secret,sa -A

    To get information about the certificates configured in a particular Secret, its YAML output will show it encoded it base64 format. You can decode it and read it with the commands below:

    # kubectl get secret <secret-name> -n <namespace-name> -oyaml
    # echo "<base64-encoded-certificate-from-above-output>" | base64 -d | openssl x509 -noout -dates