System services are not working properly, returning an error message that the corresponding service or webhook has an expired certificate.
The certificate is managed by cert-manager which manages system pods in Supervisor and Workload clusters. Cert-manager does not manage Kubernetes Certificates.
While connected to the vCenter Server Appliance (VCSA):
cat /var/log/vmware/wcp/wcpsvc.log
Failed calling webhook, failing open <system service>.vmware.com: failed calling webhook "<webhook service>": failed to call webhook: Post "https://<webhook service>:<port>/<webhook service>?timeout=10s"
Post: "https://<service address>:<port>/convert?timeout=30s": tls: failed to verify certificate: x509: certificate signed by unknown authority
x509: certificate has expired or is not yet valid: current time YYYY-MM-DDTHH:MM:SSZ is after YYYY-MM-DDTHH:MM:SSZ"
While connected to the Supervisor cluster context, the following symptoms are observed:
kubectl get pods -A
kubectl logs -n <affected pod namespace> <affected pod name>
kubectl logs -n <affected pod namespace> <affected pod name>
http: TLS handshake error from <service IP address>:<port>: remote error: tls: bad certificate
kubectl get pods -A | grep kube-apiserver
kubectl logs -n kube-system <kube-apiserver pod name>
x509: certificate has expired or is not yet valid: current time YYYY-MM-DDTHH:MM:SSZ is after YYYY-MM-DDTHH:MM:SSZ"
vSphere with Tanzu 7.0
vSphere with Tanzu 8.0
This issue can occur regardless of whether the affected cluster is managed by Tanzu Mission Control (TMC) or not
Cert-manager is responsible for automatic rotation of certificates for many vmware system and kube system pods as well as packages.
If certificates are expired in system pods, then services reliant on those certificates will fail with certificate expiry errors.
Certain system pods are dependent on other system pods and can cause multiple system pods to fail with certificate errors due to the corresponding system pod's service certificate expiry.
Cert-manager will need to be looked into and fixed to restore system pod certificate management.
These certificates do not show up in certificate checks using the certmgr tool. The certmgr tool is only for Kubernetes Certificates.
It is expected that cert-manager will automatically renew the certificates for system and vmware pods running on Supervisor and Workload clusters prior to expiry.
However, there are circumstances (which will vary by scenario) where cert-manager fails to renew the certificates.
The cause of the cert-manager pod failing to renew the certificates before expiry will need to be investigated, ideally.
However, the cert-manager pod can be restarted to force it to renew the certificates.
kubectl get pods -A | grep cert-manager
kubectl logs -n <cert-manager-namespace> <cert-manager pod name>
kubectl rollout restart deploy -n <cert-manager-namespace>
kubectl get pods -n <cert-manager-namespace>
kubectl get deploy -n <affected pod namespace>
kubectl rollout restart deploy -n <affected pod namespace> <affected pod>
kubectl get pods -n <affected pod namespace>
kubectl logs -n <affected pod namespace> <affected pod name>
kubectl logs -n <affected pod namespace> <affected pod name>