VKS cluster telegraf pods report "tls: failed to verify certificate: x509: certificate has expired or is not yet valid"
search cancel

VKS cluster telegraf pods report "tls: failed to verify certificate: x509: certificate has expired or is not yet valid"

book

Article ID: 434121

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • Telegraf can be used in VKS workload clusters to send metrics to VCF Operations.

  • The following error is observed in the Telegraf logs located in the tanzu-system-telegraf namespace:

stderr F 2026-##-##T##:##:##Z E! [agent] Error writing to outputs.http: Post "https://supervisor-management-proxy.#######.###.#######.#####:#####/arc/tkgs/metric": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2026-##-##T##:##:##Z is after 2026-##-##T##:##:##Z

 

Environment

VMware Cloud Foundation 9.x

VMware vSphere 9.x

vSphere Kubernetes Service

Cause

This issue is caused by a synchronization failure during the rotation of the mTLS certificate chain managed by Cert-Manager. A discrepancy exists where the CA certificate expires every 90 days, while the leaf client certificates remain valid for one year. When the CA rotates, Cert-Manager fails to update the ca.crt embedded within the leaf secrets (client and downstream certificates). Consequently, while the Telegraf client certificate (tls.crt) remains valid, the trust chain is broken because the renewed CA certificate (ca.crt) is not updated in the secret associated with the client certificate.

Resolution

As a workaround for this scenario, delete the Telegraf secret in the namespace used by the VKS workload cluster on the Supervisor so cert-manager can recreate it.  Once this has been done, the updated secret should be placed in the VKS cluster.  These are the steps to follow:

I. Extend CA Expiration and Prevent Overwrites Execute the following commands on the Supervisor cluster to extend the CA lifetime and prevent management tools from reverting the change:

    1. Annotate the certificate to skip kapp updates:

      kubectl annotate certificate supervisor-management-proxy-ca -n vmware-system-cert-manager kapp.k14s.io/update-strategy=skip --overwrite

    2. Patch the certificate duration to 10 years:

      kubectl patch certificate supervisor-management-proxy-ca -n vmware-system-cert-manager --type='merge' -p '{"spec":{"duration":"87600h","renewBefore":"360h"}}'

      Note: If the Supervisor-Management-Proxy is redeployed, these annotations must be reapplied.

II. Refresh Infrastructure and Client Secrets

    1. Delete the Downstream server certificate secret:

      kubectl delete secret -n <supervisor-management-proxy-namespace> metrics-endpoint-downstream-server-cert

    2. Delete the Telegraf client secrets in the Supervisor namespace associated with the VKS cluster:

      kubectl delete secret -n <supervisor-namespace> telegraf-<cluster-name>

III. Synchronize and Verify

    1. Restart the Observability Operator to push the updated secrets:

      kubectl delete pod -n vmware-system-monitoring vmware-system-observability-controller-manager-<uuid>

    2. Verify the ca.crt validity within the VKS cluster (in kube-system or tanzu-system-telegraf namespaces):

      echo "<base64_encoded_ca.crt>" | base64 -d | openssl x509 -noout -text

    3. Confirm that the Not After date reflects the new 10-year expiration.

Additional Information

Japanese version: VKSクラスタのTelegrafポッドが「tls: 証明書の検証に失敗しました: x509: 証明書の有効期限が切れているか、まだ有効ではありません」と報告する