The Contour pods may be in a healthy Running state, but the Envoy pods show as 1/2 Running state.
Services that use contour envoy ingress controller are not working as expected.
This KB article assumes that the contour and envoy pods, daemonset and deployment are under the namespace "tanzu-system-ingress". This namespace may vary by environment.
While connected to the affected cluster where contour and envoy are running:
kubectl describe pkgi contour -n <contour pkgi namespace>
Timed out waiting for 5 minutes for resources [daemonset/envoy (apps/v1) namespace: tanzu-system-ingress]
kubectl -n tanzu-system-ingress logs daemonset/envoy -c envoy
[1][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:101] StreamListeners gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436498:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_CERTIFICATE
[1][warning][config] [./source/common/config/grpc_stream.h:201] StreamRuntime gRPC config stream to contour closed since 3210431s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
kubectl get certificates -n tanzu-system-ingress -o wide
kubectl get secret envoycert -n tanzu-system-ingress -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -dates
kubectl get secret contourcert -n tanzu-system-ingress -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -dates
vSphere Supervisor or VMware Tanzu Kubernetes Grid 2.x
Contour Package/Add-on from Tanzu Mission Control or VKS Standard Package for Contour
Certificate rotation must follow a specific sequence to maintain trust between components. If rotation occurs out of order, the client or server certificate Secret may still reference an expired Certificate Authority (CA) at the time of verification. This condition can result in failed TLS handshakes and prevent new connections from being established due to unsuccessful certificate validation.
The current rotation mechanism implicitly depends on the CA Secret being renewed before any leaf (client or server) certificates. This ensures that newly issued leaf certificates are signed using the updated CA. However, because the CA and leaf certificates for all components are initially generated simultaneously during package installation, subsequent rotations are susceptible to a race condition. As a result, there is a risk that leaf certificates may be issued against an outdated or expired CA, leading to potential connectivity issues.
This KB article assumes that the contour and envoy pods, daemonset and deployment are under the namespace "tanzu-system-ingress". This namespace may vary by environment.
kubectl get secrets -n tanzu-system-ingress contour-ca-key-pair -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -dates
If the contour-ca-key CA is expired, cert-manager may have failed to properly renew its certificate. You can restart all cert-manager pods with the below command:
kubectl rollout restart deploy -n <cert-manager namespace>
kubectl get secret -n tanzu-system-ingress contourcert -o yaml > contourcert-backup.yaml
kubectl get secret -n tanzu-system-ingress envoycert -o yaml > envoycert-backup.yaml
kubectl delete secret -n tanzu-system-ingress contourcert
kubectl delete secret -n tanzu-system-ingress envoycert
kubectl get secret -n tanzu-system-ingress
kubectl get secret -n tanzu-system-ingress contourcert -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -serial -dates -noout
kubectl get secret -n tanzu-system-ingress contourcert -o jsonpath='{.data.tls\.crt}'| base64 -d | openssl x509 -serial -dates -noout
kubectl get secret -n tanzu-system-ingress envoycert -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -serial -dates -noout
kubectl get secret -n tanzu-system-ingress envoycert -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -serial -dates -noout
kubectl get pods -n tanzu-system-ingress
A restart should not be necessary, but if envoy and contour pods need to be restarted, the following commands can be used:
kubectl rollout restart deploy -n tanzu-system-ingress
kubectl rollout restart daemonset -n tanzu-system-ingress
To prevent this issue from occurring again, please update the contour package with below lines in data value yaml file:
certificates:
caDuration: 8760h
caRenewBefore: 720h
leafDuration: 720h
leafRenewBefore: 360h
Refer: Install Contour with Envoy
infrastructure_provider: vsphere
namespace: tanzu-system-ingress
contour:
configFileContents: {}
useProxyProtocol: false
replicas: 2
pspNames: "vmware-system-restricted"
logLevel: info
envoy:
service:
type: LoadBalancer
annotations: {}
externalTrafficPolicy: Cluster
disableWait: false
hostPorts:
enable: true
http: 80
https: 443
hostNetwork: false
terminationGracePeriodSeconds: 300
logLevel: info
certificates:
caDuration: 8760h
caRenewBefore: 720h
leafDuration: 720h
leafRenewBefore: 360h
This above parameter ensures, leaf certs like envoy and contour cert renewed at 720h i.e 30 days and main ca cert will renewed once a year and this ensures the race condition doesn't occur.