vSphere Supervisor Workload Cluster Error: clusterclass is not successfully reconciled: status of the VariablesReconciled condition on ClusterClass must be "True"
search cancel

vSphere Supervisor Workload Cluster Error: clusterclass is not successfully reconciled: status of the VariablesReconciled condition on ClusterClass must be "True"

book

Article ID: 424003

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • All VKSCluster operation (e.g.,Scaling, Creating, upgrade) will all not start .

EX:  Horizontally scale a VKS cluster worker nodes count by changing the number of nodes will not start.

    • The vks cluster machine deployment (MD)  object will be "Running"  PHASE status and not "ScalingUp/ScalingDown" PHASE  and the number replicas didn't chage.

# kubectl  get  md   <clusterName> -n <namespace>

NAME                                                             CLUSTER          REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE    VERSION
machinedeployment.cluster.x-k8s.io/clusterName-worker-l9crz      dmz-prod-cls01   5          5       5         0             Running   148d   v1.33.1+vmware.1-fips

  • The describe of the cluster kubernetes object for the vks cluster will show Message "ClusterClass is not successfully reconciled: status of VariablesReconciled condition on ClusterClass must be "True""

# kubectl describe cluster <clusterName> -n <namespace> 

      Message:
      Observed Generation:   12
      Reason:                Available
      Status:                True
      Type:                  WorkersAvailable
      Last Transition Time:  2025-12-06T05:33:38Z
      Message:               ClusterClass is not successfully reconciled: status of VariablesReconciled condition on ClusterClass must be "True"
      Observed Generation:   12
      Reason:                ReconcileFailed
      Status:                False
      Type:                  TopologyReconciled
      Last Transition Time:  2025-10-16T10:53:47Z
      Message:
      Observed Generation:   12
      Reason:                NotRollingOut
      Status:                False
      Type:                  RollingOut
      Last Transition Time:  2025-07-24T12:55:08Z

  • Running the following command  show VariableDiscovery of the ClusterClass is faling since the connection to the runtime-extension-webhook-service.svc-tkg-domain-c### service is faling with error "unknown certificate authority"

    # kubectl get  cc -n svc-tkg-domain-c### builtin-generic-v3.3.0  -o jsonpath='{.status.conditions}' | jq
    [
      {
        "lastTransitionTime": "2025-08-18T03:43:44Z",
        "status": "True",
        "type": "RefVersionsUpToDate"
      },
      {
        "lastTransitionTime": "2026-01-24T09:46:21Z",
        "message": "VariableDiscovery failed: failed to call DiscoverVariables for patch default: failed to call extension handler \"discover-variables.runtime-extension\": http call failed: Post \"https://runtime-extension-webhook-service.svc-tkg-domain-c8.svc:443/hooks.runtime.cluster.x-k8s.io/v1alpha1/discovervariables/discover-variables?timeout=10s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"x509: invalid signature: parent certificate cannot sign this kind of certificate\" while trying to verify candidate authority certificate \"serial:340174157205981460336478744338522218632\")",
        "reason": "VariableDiscoveryFailed",
        "severity": "Error",
        "status": "False",
        "type": "VariablesReconciled"
      }
    ]

     

  • The runtime-extension-controller-manager-########## pod  logs was showing TLS error "failed to verify certificate: x509: certificate signed by unknown authority"

    # kubectl logs -n svc-tkg-domain-##  runtime-extension-controller-manager-###########

    I1219 23:02:20.211585       1 ???:1] "http: TLS handshake error from 10.#.#.12:58377: tls: failed to verify certificate: x509: certificate signed by unknown authority"
    I1219 23:02:53.945972       1 ???:1] "http: TLS handshake error from 10.#.#.12:2134: tls: failed to verify certificate: x509: certificate signed by unknown authority"
    I1219 23:03:03.029886       1 ???:1] "http: TLS handshake error from 10.#.#.12:25239: tls: failed to verify certificate: x509: certificate signed by unknown authority"

  • The capi-controller-manager pod logs showing connection to the runtime-extension-webhook-service.svc-tkg-domain-c8 service IP is faling with TLS error "unknown certificate authority"

    nHandler="discover-variables.runtime-extension" hook="DiscoverVariables"
    E1219 19:33:36.604631       1 controller.go:347] "Reconciler error" err="failed to discover variables for ClusterClass builtin-generic-v3.1.0: failed to call DiscoverVariables for patch default: failed to call extension handler \"discover-variables.runtime-extension\": http call failed: Post \"https://runtime-extension-webhook-service.svc-tkg-domain-c8.svc:443/hooks.runtime.cluster.x-k8s.io/v1alpha1/discovervariables/discover-variables?timeout=10s\": remote error: tls: unknown certificate authority" controller="clusterclass" controllerGroup="cluster.x-k8s.io" controllerKind="ClusterClass" ClusterClass="vmware-system-monitoring/builtin-generic-v3.1.0" namespace="vmware-system-monitoring" name="builtin-generic-v3.1.0" reconcileID="7f03f0f2-3bad-43c6-a7b8-a86e1edbf271"

 

Environment

VMware vSphere Kubernetes Service

VKS supervisor service 3.4.1 and higher

Cause

  • The Client certificate rotation is not handled correctly for the runtime-extension-controller system pod. which will cause a mismatch between the certificate in the runtime-extension-webhook-service-cert secret and the certificate on the  runtime-extension-controller pod.

  • The Cert in the runtime-extension-webhook-service-cert secret will show an newer "notBefore" and "notAfter" dates and different "Serial" number than the Cert assigned to the  runtime-extension pod.

    • The runtime-extension-webhook-service-cert secret certificate:

      # kubectl get secret/runtime-extension-webhook-service-cert -n svc-tkg-domain-## -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -dates -serial

      Ex:

      # kubectl get secret/runtime-extension-webhook-service-cert -n svc-tkg-domain-c8 -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -dates -serial

      notBefore=Jan 31 07:01:27 2026 GMT
      notAfter=May  1 07:01:27 2026 GMT
      serial=14E01E25E2C695056BB1B0D86C271B96

    • The runtime-extension Pod certificate:

      # kubectl get node $(kubectl get pod <runtime-extension-controller-POD-Name> -n svc-tkg-domain-c8 -o jsonpath='{.spec.nodeName}') -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}' | xargs -I {} sh -c "echo | openssl s_client -connect {}:9442 2>/dev/null | openssl x509 -noout -dates -serial"

      Ex:

      # kubectl get node $(kubectl get pod runtime-extension-controller-manager-6cf4d59849-j2gww -n svc-tkg-domain-c8 -o jsonpath='{.spec.nodeName}') -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}' | xargs -I {} sh -c "echo | openssl s_client -connect {}:9442 2>/dev/null | openssl x509 -noout -dates -serial"

      notBefore=Jan  2 23:32:33 2026 GMT
      notAfter=Apr  2 23:32:33 2026 GMT
      serial=AB1B66F11FA96D4050F78630996A9E4F

Note:

 - You can run the following command to the master node IP address  where the  runtime-extension-controller pod is located using port 9442

# echo| openssl s_client -connect <mater-node-IP>:9442  2>/dev/null | openssl x509 -noout -dates -serial

 

Resolution

Resolution

This issue will be resolved in an upcoming release of VKS supervisor service.

Until the release is available, follow the below workaround.

 

Workaround

The system pod with the CA issue will need to be restarted to correct the certificate issue.

  1. Connect into the Supervisor cluster context.

  2. Restart the runtime-extension-controller pod:

    # kubectl get deploy -A | grep runtime
    # kubectl rollout restart deploy runtime-extension-controller-manager -n <svc-tkg-domain namespace>

  3. Check that the runtime-extension-controller pod restarted successfully:
    # kubectl get pods -n <svc-tkg-domain namespace> | grep runtime

  4. Restart the capi-controller-manager pod:
    # kubectl rollout restart deploy -n <svc-tkg-domain namespace> capi-controller-manager

  5. Check that the capi-controller-manager pods restarted successfully:
    # kubectl get pods -n <svc-tkg-domain namespace> | grep capi-controller-manager

  6. Confirm that the cluster no longer has the clusterClass error message post-restart of the above system pods:

    # kubectl describe cluster -n <cluster namespace> <cluster name>

 

Notes:

  • This certificate issue is expected to occur every 60 days, requiring a restart of the above pods.
  • A fix is currently being worked on and will be available in the an upcoming release of VKS supervisor service.
  • If the above steps do not correct the issue, reach out to VMware by Broadcom Technical Support and upload a Workload Management Supervisor Cluster Support Bundle. See Gathering Logs for vSphere with Tanzu