EX: Horizontally scale a VKS cluster worker nodes count by changing the number of nodes will not start.
# kubectl get md <clusterName> -n <namespace>NAME CLUSTER REPLICAS READY UPDATED UNAVAILABLE PHASE AGE VERSIONmachinedeployment.cluster.x-k8s.io/clusterName-worker-l9crz dmz-prod-cls01 5 5 5 0 Running 148d v1.33.1+vmware.1-fips
# kubectl describe cluster <clusterName> -n <namespace> Message: Observed Generation: 12 Reason: Available Status: True Type: WorkersAvailable Last Transition Time: 2025-12-06T05:33:38Z Message: ClusterClass is not successfully reconciled: status of VariablesReconciled condition on ClusterClass must be "True" Observed Generation: 12 Reason: ReconcileFailed Status: False Type: TopologyReconciled Last Transition Time: 2025-10-16T10:53:47Z Message: Observed Generation: 12 Reason: NotRollingOut Status: False Type: RollingOut Last Transition Time: 2025-07-24T12:55:08Z
# kubectl get cc -n svc-tkg-domain-c### builtin-generic-v3.3.0 -o jsonpath='{.status.conditions}' | jq[ { "lastTransitionTime": "2025-08-18T03:43:44Z", "status": "True", "type": "RefVersionsUpToDate" }, { "lastTransitionTime": "2026-01-24T09:46:21Z", "message": "VariableDiscovery failed: failed to call DiscoverVariables for patch default: failed to call extension handler \"discover-variables.runtime-extension\": http call failed: Post \"https://runtime-extension-webhook-service.svc-tkg-domain-c8.svc:443/hooks.runtime.cluster.x-k8s.io/v1alpha1/discovervariables/discover-variables?timeout=10s\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"x509: invalid signature: parent certificate cannot sign this kind of certificate\" while trying to verify candidate authority certificate \"serial:340174157205981460336478744338522218632\")", "reason": "VariableDiscoveryFailed", "severity": "Error", "status": "False", "type": "VariablesReconciled" }]
The runtime-extension-controller-manager-########## pod logs was showing TLS error "failed to verify certificate: x509: certificate signed by unknown authority"# kubectl logs -n svc-tkg-domain-## runtime-extension-controller-manager-###########
I1219 23:02:20.211585 1 ???:1] "http: TLS handshake error from 10.#.#.12:58377: tls: failed to verify certificate: x509: certificate signed by unknown authority"I1219 23:02:53.945972 1 ???:1] "http: TLS handshake error from 10.#.#.12:2134: tls: failed to verify certificate: x509: certificate signed by unknown authority"I1219 23:03:03.029886 1 ???:1] "http: TLS handshake error from 10.#.#.12:25239: tls: failed to verify certificate: x509: certificate signed by unknown authority"
nHandler="discover-variables.runtime-extension" hook="DiscoverVariables"E1219 19:33:36.604631 1 controller.go:347] "Reconciler error" err="failed to discover variables for ClusterClass builtin-generic-v3.1.0: failed to call DiscoverVariables for patch default: failed to call extension handler \"discover-variables.runtime-extension\": http call failed: Post \"https://runtime-extension-webhook-service.svc-tkg-domain-c8.svc:443/hooks.runtime.cluster.x-k8s.io/v1alpha1/discovervariables/discover-variables?timeout=10s\": remote error: tls: unknown certificate authority" controller="clusterclass" controllerGroup="cluster.x-k8s.io" controllerKind="ClusterClass" ClusterClass="vmware-system-monitoring/builtin-generic-v3.1.0" namespace="vmware-system-monitoring" name="builtin-generic-v3.1.0" reconcileID="7f03f0f2-3bad-43c6-a7b8-a86e1edbf271"
VMware vSphere Kubernetes Service
VKS supervisor service 3.4.1 and higher
notAfter" dates and different "Serial" number than the Cert assigned to the runtime-extension pod.# kubectl get secret/runtime-extension-webhook-service-cert -n svc-tkg-domain-## -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -dates -serialEx:# kubectl get secret/runtime-extension-webhook-service-cert -n svc-tkg-domain-c8 -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -noout -dates -serialnotBefore=Jan 31 07:01:27 2026 GMTnotAfter=May 1 07:01:27 2026 GMTserial=14E01E25E2C695056BB1B0D86C271B96# kubectl get node $(kubectl get pod <runtime-extension-controller-POD-Name> -n svc-tkg-domain-c8 -o jsonpath='{.spec.nodeName}') -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}' | xargs -I {} sh -c "echo | openssl s_client -connect {}:9442 2>/dev/null | openssl x509 -noout -dates -serial"# kubectl get node $(kubectl get pod runtime-extension-controller-manager-6cf4d59849-j2gww -n svc-tkg-domain-c8 -o jsonpath='{.spec.nodeName}') -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}' | xargs -I {} sh -c "echo | openssl s_client -connect {}:9442 2>/dev/null | openssl x509 -noout -dates -serial"
notBefore=Jan 2 23:32:33 2026 GMTnotAfter=Apr 2 23:32:33 2026 GMTserial=AB1B66F11FA96D4050F78630996A9E4F
Note:
- You can run the following command to the master node IP address where the runtime-extension-controller pod is located using port 9442# echo| openssl s_client -connect <mater-node-IP>:9442 2>/dev/null | openssl x509 -noout -dates -serial
Resolution
This issue will be resolved in an upcoming release of VKS supervisor service.
Until the release is available, follow the below workaround.
Workaround
The system pod with the CA issue will need to be restarted to correct the certificate issue.
# kubectl get deploy -A | grep runtime# kubectl rollout restart deploy runtime-extension-controller-manager -n <svc-tkg-domain namespace>
# kubectl get pods -n <svc-tkg-domain namespace> | grep runtime# kubectl rollout restart deploy -n <svc-tkg-domain namespace> capi-controller-manager# kubectl get pods -n <svc-tkg-domain namespace> | grep capi-controller-manager# kubectl describe cluster -n <cluster namespace> <cluster name>
Notes: