VKS cluster telegraf pods report "tls: failed to verify certificate: x509: certificate has expired or is not yet valid"
search cancel

VKS cluster telegraf pods report "tls: failed to verify certificate: x509: certificate has expired or is not yet valid"

book

Article ID: 434121

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • Telegraf can be used in VKS workload clusters to send metrics to VCF Operations.

  • The following error is observed in the Telegraf logs located in the tanzu-system-telegraf namespace:

stderr F 2026-##-##T##:##:##Z E! [agent] Error writing to outputs.http: Post "https://supervisor-management-proxy.#######.###.#######.#####:#####/arc/tkgs/metric": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2026-##-##T##:##:##Z is after 2026-##-##T##:##:##Z

 

Environment

VMware Cloud Foundation 9.x

VMware vSphere 9.x

vSphere Kubernetes Service

Cause

This issue is caused by a synchronization failure during the rotation of the mTLS certificate chain managed by Cert-Manager. A discrepancy exists where the CA certificate expires every 90 days, while the leaf client certificates remain valid for one year. When the CA rotates, Cert-Manager fails to update the ca.crt embedded within the leaf secrets (client and downstream certificates). Consequently, while the Telegraf client certificate (tls.crt) remains valid, the trust chain is broken because the renewed CA certificate (ca.crt) is not updated in the secret associated with the client certificate.

Resolution

As a workaround for this scenario, delete the Telegraf secret in the namespace used by the VKS workload cluster on the Supervisor so cert-manager can recreate it.  Once this has been done, the updated secret should be placed in the VKS cluster.  These are the steps to follow:

I. Extend CA Expiration: Execute the following commands on the Supervisor cluster to extend the CA lifetime:

    1. Confirm the current validity period for the 'supervisor-management-proxy-ca' Certificate associated with 'supervisor-management-proxy'.

      kubectl get cert -n vmware-system-cert-manager supervisor-management-proxy-ca -o yaml

    2. Create the 'cert-duration-overlay' secret in the 'vmware-system-supervisor-services' namespace using the following command (this will set a duration of 10 years):

      cat <<EOF | kubectl apply -f -
      apiVersion: v1
      kind: Secret
      metadata:
        name: cert-duration-overlay
        namespace: vmware-system-supervisor-services
      stringData:
        patch-duration.yaml: |
          #@ load("@ytt:overlay", "overlay")
      
          #@overlay/match by=overlay.subset({"kind": "Certificate", "metadata": {"name": "supervisor-management-proxy-ca", "namespace": "vmware-system-cert-manager"}})
          ---
          metadata:
            #@overlay/match missing_ok=True
            annotations:
          spec:
            #@overlay/match missing_ok=True
            duration: "87600h"
            #@overlay/match missing_ok=True
            renewBefore: "360h"
      EOF

       

    3. Verify the successful creation of the secret using the following command:

      kubectl get secret cert-duration-overlay -n vmware-system-supervisor-services

    4. Annotate the package installation with the overlay secret by running the following command:

      NOTE: This step and its following steps will need to be repeated if the supervisor-management-proxy package is reinstalled)

      kubectl annotate packageinstall svc-supervisor-management-proxy.vmware.com -n vmware-system-supervisor-services   ext.packaging.carvel.dev/ytt-paths-from-secret-name.1=cert-duration-overlay

      1. Verify successful annotation of the PackageInstall resource by executing the following command:

        kubectl describe pkgi -n vmware-system-supervisor-services svc-supervisor-management-proxy.vmware.com

      2. In the output from the above command, ensure that 'cert-duration-overlay' is listed under the annotations.

    5. Ensure the PackageInstall resource has reconciled successfully. Monitor the packageinstall status updates using the following command:

      kubectl get pkgi -n vmware-system-supervisor-services svc-supervisor-management-proxy.vmware.com -w
    6. Delete the secret associated with supervisor-management-proxy-ca cert to regenerate the certificate:

      kubectl delete secret -n vmware-system-cert-manager supervisor-management-proxy-ca-secret

    7. After successful reconciliation, confirm that the certificate has been updated with the new validity period defined by the overlay secret using the following command:

      kubectl describe cert -n vmware-system-cert-manager supervisor-management-proxy-ca

    8. Review the output from the above command to verify that the 'Spec.Duration' field reflects the expected new validity period.

II. Refresh Infrastructure and Client Secrets

    1. Delete the Downstream server certificate secret:

      kubectl delete secret -n <supervisor-management-proxy-namespace> metrics-endpoint-downstream-server-cert

    2. Delete the Telegraf client secrets within the Supervisor namespace that are associated with the VKS cluster:

      kubectl delete secret -n <supervisor-namespace> telegraf-<cluster-name>

    3. Delete the metrics-proxy secrets using the following two commands:

      kubectl delete secret metrics-proxy-http-config -n kube-system

      kubectl delete secret metrics-proxy-tls-config -n kube-system

III. Synchronize and Verify

    1. Restart the Observability Operator to push the updated secrets to the VKS cluster:

      kubectl delete pod -n vmware-system-monitoring vmware-system-observability-controller-manager-<uuid>

    2. Confirm that the secrets have been recreated successfully on the Supervisor by fetching them with the following commands:

      kubectl get secret -n <supervisor-management-proxy-namespace> metrics-endpoint-downstream-server-cert

      kubectl get secret -n <supervisor-namespace> telegraf-<cluster-name>
    3. Confirm the metrics-proxy secrets have been recreated using the following two commands:

      kubectl get secret metrics-proxy-http-config -n kube-system

      kubectl get secret metrics-proxy-tls-config -n kube-system
    4. Verify the validity of the ca.crt within the VKS cluster's 'tanzu-system-telegraf' namespace. This requires authentication and login to the VKS cluster. Execute the following command to fetch the secret:

      kubectl get secret -n tanzu-system-telegraf metrics-proxy-tls-config -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -text

    5. Confirm that the "Not After" date reflects the new 10-year expiration.

Additional Information

Japanese version: VKSクラスタのTelegrafポッドが「tls: 証明書の検証に失敗しました: x509: 証明書の有効期限が切れているか、まだ有効ではありません」と報告する