Contour Envoy Pods Failure with SSLV3_ALERT_BAD_CERTIFICATE or CERTIFICATE_VERIFY_FAILED error
search cancel

Contour Envoy Pods Failure with SSLV3_ALERT_BAD_CERTIFICATE or CERTIFICATE_VERIFY_FAILED error

book

Article ID: 316952

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid vSphere with Tanzu Tanzu Kubernetes Runtime

Issue/Introduction

The Contour pods may be in a healthy Running state, but the Envoy pods show as 1/2 Running state.

Services that use contour envoy ingress controller are not working as expected.

This KB article assumes that the contour and envoy pods, daemonset and deployment are under the namespace "tanzu-system-ingress". This namespace may vary by environment.


While connected to the affected cluster where contour and envoy are running:

  • Contour's package install (PKGI) shows Reconcile Failed state with the following error message:
    kubectl describe pkgi contour -n <contour pkgi namespace>
    
    Timed out waiting for 5 minutes for resources [daemonset/envoy (apps/v1) namespace: tanzu-system-ingress]

     

  • The Envoy pod logs show one of the following error messages repeatedly:
    kubectl -n tanzu-system-ingress logs daemonset/envoy -c envoy
    
    [1][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:101] StreamListeners gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436498:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_CERTIFICATE
    
    [1][warning][config] [./source/common/config/grpc_stream.h:201] StreamRuntime gRPC config stream to contour closed since 3210431s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED

     

  • The envoy and contour certificates are valid. To check certificate validity, use the command:
    kubectl get certificates -n tanzu-system-ingress -o wide

     

  • However, the secrets containing the envoy (envoycert) and contour (contourcert) CA show expired:
    kubectl get secret envoycert -n tanzu-system-ingress -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -dates
    kubectl get secret contourcert -n tanzu-system-ingress -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -dates

Environment

vSphere Supervisor or VMware Tanzu Kubernetes Grid 2.x

Contour Package/Add-on from Tanzu Mission Control or VKS Standard Package for Contour

Cause

The CA that signed the envoy/contour certificate has expired and was not properly renewed by the cert-manager system pod. While the contour and envoy certificate objects may be renewed properly, the pods also use the corresponding secret object's CA.
If the secret's CA is not properly renewed, contour and/or envoy pods will not work properly and services using this ingress controller will fail.

Certificate rotation must follow a specific sequence to maintain trust between components. If rotation occurs out of order, the client or server certificate Secret may still reference an expired Certificate Authority (CA) at the time of verification. This condition can result in failed TLS handshakes and prevent new connections from being established due to unsuccessful certificate validation.

The current rotation mechanism implicitly depends on the CA Secret being renewed before any leaf (client or server) certificates. This ensures that newly issued leaf certificates are signed using the updated CA. However, because the CA and leaf certificates for all components are initially generated simultaneously during package installation, subsequent rotations are susceptible to a race condition. As a result, there is a risk that leaf certificates may be issued against an outdated or expired CA, leading to potential connectivity issues.

 

Resolution

This KB article assumes that the contour and envoy pods, daemonset and deployment are under the namespace "tanzu-system-ingress". This namespace may vary by environment.

  1. Connect into the affected cluster's context or control plane node.

  2. Verify that most recent contour-ca-key CA is valid:
    kubectl get secrets -n tanzu-system-ingress contour-ca-key-pair -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -dates

    If the contour-ca-key CA is expired, cert-manager may have failed to properly renew its certificate. You can restart all cert-manager pods with the below command:

    kubectl rollout restart deploy -n <cert-manager namespace>

     

  3. Take a backup of the envoy and contour secrets:
    kubectl get secret -n tanzu-system-ingress contourcert -o yaml > contourcert-backup.yaml
    kubectl get secret -n tanzu-system-ingress envoycert -o yaml > envoycert-backup.yaml

     

  4. Delete the envoy and contour secret which will force cert-manager to automatically generate a new CA matching the dates on the contour-ca-key:
    kubectl delete secret -n tanzu-system-ingress contourcert
    kubectl delete secret -n tanzu-system-ingress envoycert

     

  5. Confirm that the secret was automatically recreated by cert-manager:
    kubectl get secret -n tanzu-system-ingress

     

  6. Check that the certificates were properly renewed:
    kubectl get secret -n tanzu-system-ingress contourcert -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -serial -dates -noout
    kubectl get secret -n tanzu-system-ingress contourcert -o jsonpath='{.data.tls\.crt}'| base64 -d | openssl x509 -serial -dates -noout
    kubectl get secret -n tanzu-system-ingress envoycert -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -serial -dates -noout
    kubectl get secret -n tanzu-system-ingress envoycert -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -serial -dates -noout

     

  7. Confirm on the status of all envoy and contour pods:
    kubectl get pods -n tanzu-system-ingress

    A restart should not be necessary, but if envoy and contour pods need to be restarted, the following commands can be used:

    kubectl rollout restart deploy -n tanzu-system-ingress
    kubectl rollout restart daemonset -n tanzu-system-ingress

     

To prevent this issue from occurring again, please update the contour package with below lines in data value yaml file:

certificates:
  caDuration: 8760h
  caRenewBefore: 720h
  leafDuration: 720h
  leafRenewBefore: 360h 

 Refer: Install Contour with Envoy

infrastructure_provider: vsphere
namespace: tanzu-system-ingress
contour:
  configFileContents: {}
  useProxyProtocol: false
  replicas: 2
  pspNames: "vmware-system-restricted"
  logLevel: info
envoy:
  service:
    type: LoadBalancer
    annotations: {}
    externalTrafficPolicy: Cluster
    disableWait: false
  hostPorts:
    enable: true
    http: 80
    https: 443
  hostNetwork: false
  terminationGracePeriodSeconds: 300
  logLevel: info
certificates:
  caDuration: 8760h
  caRenewBefore: 720h
  leafDuration: 720h
  leafRenewBefore: 360h 

This above parameter ensures, leaf certs like envoy and contour cert renewed at 720h i.e 30 days and main ca cert will renewed once a year and this ensures the race condition doesn't occur.