Failed calling CAPI webhook :x509:certificate has expired or is not yet valid
search cancel

Failed calling CAPI webhook :x509:certificate has expired or is not yet valid

book

Article ID: 313000

calendar_today

Updated On: 11-08-2023

Products

Tanzu Kubernetes Grid

Issue/Introduction

This KB will provide steps to renew TLS certificates when expired.

Symptoms:
Connectivity to CAPI webhook service, https://capi-webhook-service.capi-webhook-system.svc, is failing with:
Internal error occurred: failed calling webhook\"default.machineset.cluster.x-k8s.io\":Post https://capi-webhook-service.capi-webhook-system.svc:443/mutate-cluster-x-k8s-io-v1alpha3-machineset?timeout=30s: x509:certificate has expired or is not yet valid
  • Certificates are  expired and are in “not ready” state on TKG management cluster. 
kubectl get certificates -A
NAMESPACE             NAME                                      READY   SECRET                                            AGE
capi-webhook-system   capi-kubeadm-bootstrap-serving-cert       False   capi-kubeadm-bootstrap-webhook-service-cert       442d
capi-webhook-system   capi-kubeadm-control-plane-serving-cert   False   capi-kubeadm-control-plane-webhook-service-cert   442d
capi-webhook-system   capi-serving-cert                         False   capi-webhook-service-cert                         442d
capi-webhook-system   capv-serving-cert                         False   capv-webhook-service-cert                         442d

  • Describe the certificate shows that it is valid
Status:
  Conditions:
    Last Transition Time:  2022-09-05T05:40:18Z
    Message:               Certificate has expired on 05 Sep 22 05:27 UTC
    Reason:                Expired
    Status:                False
    Type:                  Ready
  Not After:               2022-09-05T05:27:25Z


Cause

TLS certificates are expired , you can check capi-controller-manager and capv-controller-manager  pod logs.
 
2022/09/26 00:13:22 http: TLS handshake error from 100.107.217.64:30884: remote error: tls: bad certificate


Also, cert-manager-xx pod shows below :
E0820 07:40:08.896576       1 controller.go:131] cert-manager/controller/certificates "msg"="re-queuing item  due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": the server is currently unable to handle the request" "key"="capi-webhook-system/capv-serving-cert" 


NOTE : This log excerpt is an example. Date, time, and environmental variables may vary depending on your environment.

Resolution

  • Take backup of below secrets and delete them so that cert-manager will recreate it. 
kubectl delete secret capi-webhook-service-cert -n capi-webhook-system
kubectl delete secret  capv-webhook-service-cert   -n capi-webhook-system
kubectl delete secret  capi-kubeadm-control-plane-webhook-service-cert   -n capi-webhook-system
kubectl delete secret  capi-kubeadm-bootstrap-webhook-service-cert     -n capi-webhook-system
  •  Check the secrets and certificates again
kubectl get secrets -A | grep capi-webhook-service-cert
capi-webhook-system                 capi-webhook-service-cert                         kubernetes.io/tls                     3      8h

kubectl get secrets -A | grep capv-webhook-service-cert
capi-webhook-system                 capv-webhook-service-cert                         kubernetes.io/tls                     3      8h

kubectl get secrets -A | grep capi-kubeadm-control-plane-webhook-service-cert
capi-webhook-system                 capi-kubeadm-control-plane-webhook-service-cert   kubernetes.io/tls                     3      8h

kubectl get secrets -A | grep capi-kubeadm-bootstrap-webhook-service-cert
capi-webhook-system                 capi-kubeadm-bootstrap-webhook-service-cert       kubernetes.io/tls                     3      8h

kubectl get certificates -A
NAMESPACE             NAME                                      READY   SECRET                                            AGE
capi-webhook-system   capi-kubeadm-bootstrap-serving-cert       True    capi-kubeadm-bootstrap-webhook-service-cert       13h
capi-webhook-system   capi-kubeadm-control-plane-serving-cert   True    capi-kubeadm-control-plane-webhook-service-cert   13h
capi-webhook-system   capi-serving-cert                         True    capi-webhook-service-cert                         13h
capi-webhook-system   capv-serving-cert                         True    capv-webhook-service-cert                         13h
 
NOTE : In case certificates are stuck in "pending state" during rotation 
Status:
  Conditions:
    Last Transition Time:  2022-09-05T05:40:18Z
    Message:               Certificate pending issuance
    Reason:                Pending
    Status:                False
    Type:                  Ready
Events:                    <none>

You can restart below pods in the cert-manager namespace after taking a backup of the logs. 

kubectl get pods -n cert-manager
NAME                                      READY   STATUS    RESTARTS   AGE
cert-manager-555b67f478-g8vd4             1/1     Running   0          12m
cert-manager-cainjector-768bd8f8f-wdtm8   1/1     Running   0          12m
cert-manager-webhook-5c6594fccc-f8bw7     1/1     Running   0          20m
 


Additional Information

Impact/Risks:
Cluster operations may affect such as scaling up nodes. You will see below errors :
 
Events:
  Type     Reason          Age                        From                          Message
  ----     ------          ----                       ----                          -------
  Warning  ReconcileError  24m (x2097 over 4d)        machinedeployment-controller  Internal error occurred: failed calling webhook "default.machineset.cluster.x-k8s.io": Post https://capi-webhook-service.capi-webhook-system.svc:443/mutate-cluster-x-k8s-io-v1alpha3-machineset?timeout=30s: x509: certificate has expired or is not yet valid

  Warning  FailedScale     <invalid> (x2106 over 4d)  machinedeployment-controller  Failed to scale MachineSet "pltb-4g-evnfm-md-0-8696b74b8d": Internal error occurred: failed calling webhook "default.machineset.cluster.x-k8s.io": Post https://capi-webhook-service.capi-webhook-system.svc:443/mutate-cluster-x-k8s-io-v1alpha3-machineset?timeout=30s: x509: certificate has expired or is not yet valid


NOTE : This log excerpt is an example. Date, time, and environmental variables may vary depending on your environment.