You upgrade your guest cluster and find that a number of pods in the TMC namespace are in CrashLoopBackOff .
You also see that the envoy pods are Running but not completely
$ kubectl get pods -n <tmc local namespace> | grep contour-envoycontour-envoy-6lx25 1/2 Running 0 6m4scontour-envoy-8l4pk 1/2 Running 0 6m4scontour-envoy-k5p2d 1/2 Running 0 6m4s
Describing envoy pod shows that the readiness probe is failingEvents: Type Reason Age From Message ---- ------ ---- ---- -------
Warning Unhealthy 101s (x102 over 6m32s) kubelet Readiness probe failed: Get "http://x.x.x.x:8002/ready": dial tcp x.x.x.x:8002: connect: connection refused
Checking the envoy as per KB, Contour Envoy Pods Failure with SSLV3_ALERT_BAD_CERTIFICATE or CERTIFICATE_VERIFY_FAILED error, you can see that the secrets containing the envoy (envoycert) and contour (contourcert) CA show expired.
k get secrets -n <tmc local namespace> envoycert -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -nout -dates
notBefore=Jan 6 17:00:01 2025 GMT
notAfter=Jan 7 17:00:01 2026 GMT
-----BEGIN CERTIFICATE-----
<cert data>
-----END CERTIFICATE-----
$ k get secrets -n <tmc local namespace> contourcert -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -nout -dates
notBefore=Jan 6 17:00:01 2025 GMT
notAfter=Jan 7 17:00:01 2026 GMT
-----BEGIN CERTIFICATE-----
<cert data>
-----END CERTIFICATE-----
However, KB, Contour Envoy Pods Failure with SSLV3_ALERT_BAD_CERTIFICATE or CERTIFICATE_VERIFY_FAILED error, does not resolve the issue
vCenter 8.0U3
VKS Guest Cluster with TMC installed and using contour for ingress.
The CA that signed the envoy/contour certificate has expired and was not properly renewed by the cert-manager system pod.
While the contour and envoy certificate objects may be renewed properly, the pods also use the corresponding secret object's CA.
If the secret's CA is not properly renewed, contour and/or envoy pods will not work properly and services using this ingress controller will fail.
In this environment, the envoy/contour certificates and CA are not managed by cert-manager and expired after one year.
After checking and finding that the ca.crt had expired
1. Find the certgen job for contour
kubectl get jobs -A | grep contour-certgen
<tmc local namespace> contour-contour-certgen Complete 1/1 2s 372d
2. Capture the yaml for the job and also made a backup
kubectl get jobs -n <tmc local namespace> contour-contour-certgen -o yaml >contour-contour-certgen-job.yamlkubectl get jobs -n <tmc local namespace> contour-contour-certgen -o yaml >contour-contour-certgen-job-bak.yaml
3. Edit the file contour-contour-certgen-job.yaml and clear out any events, timestamps, uids, and previous configs.
vi contour-contour-certgen-job.yaml
4. Recreate the contour-certgen job
a. After saving the file delete and recreate the contour-certgen job
kubectl delete jobs -n <tmc local namespace> contour-contour-certgen
kubectl apply -f contour-contour-certgen-job.yaml
b. Check that the job has run
kubectl get jobs -n <tmc local namespace> contour-contour-certgen
kubectl describe jobs -n <tmc local namespace> contour-contour-certgen
5. Check that the certificates in secrets envoycert and contourcert had rotated successfully
k get secrets -n <tmc local namespace> envoycert -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -nout -dates
k get secrets -n <tmc local namespace> contourcert -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -nout -dates
6. Delete each of the envoy pods in <tmc local namespace>
kubectl delete pods -n <tmc local namespace> <envoy pod name>
7. Check that envoy pods are back up and running and that the other TMC pods are back up and running
k get pods -A | grep -vE "
k get pods -A | grep -vE "Running|Complete"
Contour Documentation: Rotate using the contour-certgen job documentation