In a vSphere Kubernetes cluster, one or more nodes are stuck in Provisioned state.
While connected to the Supervisor cluster context, the following symptoms are observed:
kubectl get machines -n <cluster namespace>
kubectl get vm -o wide -n <cluster namespace>
While connected directly to the impacted node through SSH as breakglass user vmware-system-user:
error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "abcdef"
The cluster-info ConfigMap does not yet contain a JWS signature for token ID "abcdef", will try again
While connected to the affected vSphere Kubernetes cluster's context:
kubectl get configmap cluster-info -n kube-public -o yaml
kubeadm token list | grep <token ID>
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
abcdef.1234567890 <invalid> YYYY-MM-DDThh:mm:ssZ authentication,signing token generated by cluster-api-bootstrap-provider-kubeadm system:bootstrappers:kubeadm:default-node-token
vSphere 7.0 with Tanzu
vSphere 8.0 with Tanzu
This issue can occur regardless of whether or not this cluster is managed by TMC.
The affected vSphere Kubernetes cluster's certificates expired and affected system services are still failing, referencing the expired certificates.
This issue can still occur if the certificates were renewed, but steps were missed in restarting the necessary services to pick up the certificate change.
The services may need to be restarted on all control plane nodes in the vSphere Kubernetes cluster but should be restarted on one control plane at a time to ensure that the services are not all down at the same time.
The vSphere Kubernetes cluster certificates will need to be renewed and the appropriate services restarted accordingly.
Please use the certmgr script included in the following KB article to renew the affected cluster's certificates.
If the above certmgr script does not successfully renew the certificates, please open a ticket to VMware by Broadcom Support referencing this KB article for assistance in manually renewing the certificates in the affected cluster.