EX: Horizontally scale a VKS cluster worker nodes count by changing the number of nodes will not start.
# kubectl get md <clusterName> -n <namespace>NAME CLUSTER REPLICAS READY UPDATED UNAVAILABLE PHASE AGE VERSIONmachinedeployment.cluster.x-k8s.io/clusterName-worker-l9crz dmz-prod-cls01 5 5 5 0 Running 148d v1.33.1+vmware.1-fips
# kubectl describe cluster <clusterName> -n <namespace> Message: Observed Generation: 12 Reason: Available Status: True Type: WorkersAvailable Last Transition Time: 2025-12-06T05:33:38Z Message: ClusterClass is not successfully reconciled: status of VariablesReconciled condition on ClusterClass must be "True" Observed Generation: 12 Reason: ReconcileFailed Status: False Type: TopologyReconciled Last Transition Time: 2025-10-16T10:53:47Z Message: Observed Generation: 12 Reason: NotRollingOut Status: False Type: RollingOut Last Transition Time: 2025-07-24T12:55:08Z
# kubectl get cc -n svc-tkg-domain-c### builtin-generic-v3.3.0 -o jsonpath='{.status.conditions}' | jq[ { "lastTransitionTime": "2025-08-18T03:43:44Z", "status": "True", "type": "RefVersionsUpToDate" }, { "lastTransitionTime": "2026-01-24T09:46:21Z", "message": "VariableDiscovery failed: failed to call DiscoverVariables for patch default: failed to call extension handler \"discover-variables.runtime-extension\": http call failed: Post \"https://runtime-extension-webhook-service.svc-tkg-domain-c8.svc:443/hooks.runtime.cluster.x-k8s.io/v1alpha1/discovervariables/discover-variables?timeout=10s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"x509: invalid signature: parent certificate cannot sign this kind of certificate\" while trying to verify candidate authority certificate \"serial:340174157205981460336478744338522218632\")", "reason": "VariableDiscoveryFailed", "severity": "Error", "status": "False", "type": "VariablesReconciled" }]
The runtime-extension-controller-manager-########## pod logs was showing TLS error "failed to verify certificate: x509: certificate signed by unknown authority"# kubectl logs -n svc-tkg-domain-## runtime-extension-controller-manager-###########
I1219 23:02:20.211585 1 ???:1] "http: TLS handshake error from 10.#.#.12:58377: tls: failed to verify certificate: x509: certificate signed by unknown authority"I1219 23:02:53.945972 1 ???:1] "http: TLS handshake error from 10.#.#.12:2134: tls: failed to verify certificate: x509: certificate signed by unknown authority"I1219 23:03:03.029886 1 ???:1] "http: TLS handshake error from 10.#.#.12:25239: tls: failed to verify certificate: x509: certificate signed by unknown authority"
nHandler="discover-variables.runtime-extension" hook="DiscoverVariables"E1219 19:33:36.604631 1 controller.go:347] "Reconciler error" err="failed to discover variables for ClusterClass builtin-generic-v3.1.0: failed to call DiscoverVariables for patch default: failed to call extension handler \"discover-variables.runtime-extension\": http call failed: Post \"https://runtime-extension-webhook-service.svc-tkg-domain-c8.svc:443/hooks.runtime.cluster.x-k8s.io/v1alpha1/discovervariables/discover-variables?timeout=10s\": remote error: tls: unknown certificate authority" controller="clusterclass" controllerGroup="cluster.x-k8s.io" controllerKind="ClusterClass" ClusterClass="vmware-system-monitoring/builtin-generic-v3.1.0" namespace="vmware-system-monitoring" name="builtin-generic-v3.1.0" reconcileID="7f03f0f2-3bad-43c6-a7b8-a86e1edbf271"
VMware vSphere Kubernetes Service
VKS supervisor service 3.4.1 and higher
notAfter" dates and different "Serial" number than the Cert assigned to the runtime-extension pod.# kubectl get secret/runtime-extension-webhook-service-cert -n svc-tkg-domain-## -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -dates -serialEx:# kubectl get secret/runtime-extension-webhook-service-cert -n svc-tkg-domain-c8 -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -dates -serialnotBefore=Jan 31 07:01:27 2026 GMTnotAfter=May 1 07:01:27 2026 GMTserial=14E01E25E2C695056BB1B0D86C271B96# kubectl get node $(kubectl get pod <runtime-extension-controller-POD-Name> -n svc-tkg-domain-c8 -o jsonpath='{.spec.nodeName}') -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}' | xargs -I {} sh -c "echo | openssl s_client -connect {}:9442 2>/dev/null | openssl x509 -noout -dates -serial"# kubectl get node $(kubectl get pod runtime-extension-controller-manager-6cf4d59849-j2gww -n svc-tkg-domain-c8 -o jsonpath='{.spec.nodeName}') -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}' | xargs -I {} sh -c "echo | openssl s_client -connect {}:9442 2>/dev/null | openssl x509 -noout -dates -serial"
notBefore=Jan 2 23:32:33 2026 GMTnotAfter=Apr 2 23:32:33 2026 GMTserial=AB1B66F11FA96D4050F78630996A9E4F
Note:
- You can run the following command to the master node IP address where the runtime-extension-controller pod is located using port 9442# echo| openssl s_client -connect <mater-node-IP>:9442 2>/dev/null | openssl x509 -noout -dates -serial
Resolution
This issue is resolved in vSphere Kubernetes Service 3.6.0+v1.35 . Refer VMware vSphere Kubernetes Service Release Notes
Workaround
The system pod with the CA issue will need to be restarted to correct the certificate issue.
# kubectl get deploy -A | grep runtime# kubectl rollout restart deploy runtime-extension-controller-manager -n <svc-tkg-domain namespace>
# kubectl get pods -n <svc-tkg-domain namespace> | grep runtime# kubectl rollout restart deploy -n <svc-tkg-domain namespace> capi-controller-manager# kubectl get pods -n <svc-tkg-domain namespace> | grep capi-controller-manager# kubectl describe cluster -n <cluster namespace> <cluster name>
Notes: