vSphere Kubernetes Cluster Node stuck in Provisioned state - could not validate the identity of the API server: could not find a JWS signature in the cluster-info Configmap for token ID
search cancel

vSphere Kubernetes Cluster Node stuck in Provisioned state - could not validate the identity of the API server: could not find a JWS signature in the cluster-info Configmap for token ID

book

Article ID: 384195

calendar_today

Updated On:

Products

VMware vSphere 7.0 with Tanzu vSphere with Tanzu

Issue/Introduction

In a vSphere Kubernetes cluster, one or more nodes are stuck in Provisioned state.

 

While connected to the Supervisor cluster context, the following symptoms are observed:

  • The impacted node's machine object shows Provisioned state:
    • kubectl get machines -n <cluster namespace>
  • The impacted node's VM is powered on and has an IP address assigned:
    • kubectl get vm -o wide -n <cluster namespace>

 

While connected directly to the impacted node through SSH as breakglass user vmware-system-user:

  • The cloud-init-output.log shows error messages similar to the below where "abcdef" is the unique token ID that differs between environments:
    • error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "abcdef"

      The cluster-info ConfigMap does not yet contain a JWS signature for token ID "abcdef", will try again

 

While connected to the affected vSphere Kubernetes cluster's context:

  • The above token ID is missing from the below configmap:
    • kubectl get configmap cluster-info -n kube-public -o yaml
  • The above token ID has expired:
    • kubeadm token list | grep <token ID>

      TOKEN                     TTL         EXPIRES                USAGES                   DESCRIPTION                                                 EXTRA GROUPS
      abcdef.1234567890   <invalid> YYYY-MM-DDThh:mm:ssZ authentication,signing token generated by cluster-api-bootstrap-provider-kubeadm system:bootstrappers:kubeadm:default-node-token

Environment

vSphere 7.0 with Tanzu

vSphere 8.0 with Tanzu

This issue can occur regardless of whether or not this cluster is managed by TMC.

Cause

The affected vSphere Kubernetes cluster's certificates expired and affected system services are still failing, referencing the expired certificates.

This issue can still occur if the certificates were renewed, but steps were missed in restarting the necessary services to pick up the certificate change.

The services may need to be restarted on all control plane nodes in the vSphere Kubernetes cluster but should be restarted on one control plane at a time to ensure that the services are not all down at the same time.

Resolution

The vSphere Kubernetes cluster certificates will need to be renewed and the appropriate services restarted accordingly.

Please use the certmgr script included in the following KB article to renew the affected cluster's certificates.

If the above certmgr script does not successfully renew the certificates, please open a ticket to VMware by Broadcom Support referencing this KB article for assistance in manually renewing the certificates in the affected cluster.