Contour Envoy Pods Failure with SSLV3_ALERT_BAD_CERTIFICATE or CERTIFICATE_VERIFY

Products

VMware Tanzu Kubernetes Grid vSphere with Tanzu Tanzu Kubernetes Runtime

Issue/Introduction

The Contour pods may be in a healthy Running state, but the Envoy pods show as 1/2 Running state.

Services that use contour envoy ingress controller are not working as expected.

This KB article assumes that the contour and envoy pods, daemonset and deployment are under the namespace "tanzu-system-ingress". This namespace may vary by environment.

While connected to the affected cluster where contour and envoy are running:

Contour's package install (PKGI) shows Reconcile Failed state with the following error message:

kubectl describe pkgi contour -n <contour pkgi namespace>

Timed out waiting for 5 minutes for resources [daemonset/envoy (apps/v1) namespace: tanzu-system-ingress]

The Envoy pod logs show one of the following error messages repeatedly:

kubectl -n tanzu-system-ingress logs daemonset/envoy -c envoy

[1][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:101] StreamListeners gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436498:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_CERTIFICATE

[1][warning][config] [./source/common/config/grpc_stream.h:201] StreamRuntime gRPC config stream to contour closed since 3210431s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED

The envoy and contour certificates are valid. To check certificate validity, use the command:
```
kubectl get certificates -n tanzu-system-ingress -o wide
```

However, the secrets containing the envoy (envoycert) and contour (contourcert) CA show expired:

kubectl get secret envoycert -n tanzu-system-ingress -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -dates
kubectl get secret contourcert -n tanzu-system-ingress -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -dates

Environment

vSphere Supervisor or VMware Tanzu Kubernetes Grid 2.x

Contour Package/Add-on from Tanzu Mission Control or VKS Standard Package for Contour

Cause

The CA that signed the envoy/contour certificate has expired and was not properly renewed by the cert-manager system pod.

While the contour and envoy certificate objects may be renewed properly, the pods also use the corresponding secret object's CA.

If the secret's CA is not properly renewed, contour and/or envoy pods will not work properly and services using this ingress controller will fail.

Resolution

This KB article assumes that the contour and envoy pods, daemonset and deployment are under the namespace "tanzu-system-ingress". This namespace may vary by environment.

Connect into the affected cluster's context or control plane node.
Verify that most recent contour-ca-key CA is valid:
```
kubectl get secrets -n tanzu-system-ingress contour-ca-key-pair -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -dates
```
If the contour-ca-key CA is expired, cert-manager may have failed to properly renew its certificate. You can restart all cert-manager pods with the below command:
```
kubectl rollout restart deploy -n <cert-manager namespace>
```

Take a backup of the envoy and contour secrets:

kubectl get secret -n tanzu-system-ingress contourcert -o yaml > contourcert-backup.yaml
kubectl get secret -n tanzu-system-ingress envoycert -o yaml > envoycert-backup.yaml

Delete the envoy and contour secret which will force cert-manager to automatically generate a new CA matching the dates on the contour-ca-key:
```
kubectl delete secret -n tanzu-system-ingress contourcert
kubectl delete secret -n tanzu-system-ingress envoycert
```
Confirm that the secret was automatically recreated by cert-manager:
```
kubectl get secret -n tanzu-system-ingress
```

Check that the certificates were properly renewed:

kubectl get secret -n tanzu-system-ingress contourcert -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -serial -dates -noout
kubectl get secret -n tanzu-system-ingress contourcert -o jsonpath='{.data.tls\.crt}'| base64 -d | openssl x509 -serial -dates -noout
kubectl get secret -n tanzu-system-ingress envoycert -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -serial -dates -noout
kubectl get secret -n tanzu-system-ingress envoycert -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -serial -dates -noout

Confirm on the status of all envoy and contour pods:
```
kubectl get pods -n tanzu-system-ingress
```
A restart should not be necessary, but if envoy and contour pods need to be restarted, the following commands can be used:
```
kubectl rollout restart deploy -n tanzu-system-ingress
kubectl rollout restart daemonset -n tanzu-system-ingress
```