Machine state is not transitioning to Running from Provisioning when updating a TKC Cluster
search cancel

Machine state is not transitioning to Running from Provisioning when updating a TKC Cluster

book

Article ID: 369279

calendar_today

Updated On:

Products

VMware vSphere with Tanzu vSphere with Tanzu

Issue/Introduction

New machines will not be transitioned from Provisioning to Running state, as NodeName(status.nodeRef) will not be set on the machine.
The node for the machine will not have spec.providerID set on it.

This symptom can occur during scale-out operations for a cluster and upgrade of the Kubernetes version will be blocked. 

 

Environment

vSphere with Tanzu 7.0

vSphere with Tanzu 8.0

Tkr Versions < 1.29

 

Cause

The supervisor access token mounted into the guest cluster cloud provider is expired, and the cloud provider is not able to update it if TKR version is lower than 1.29.

Resolution

Workaround:

  • Delete the existing guest-cluster-cloud-provider pod, which will bring up a new pod with a new token. 
  • Once the cluster is back to normal, it may be necessary to delete idle node objects. If any extraneous nodes in NotReady status found, delete them using "kubect delete node <node-name>"

Resolution:

 

Additional Information

The guest-cluster-cloud-provider pod log inside the guest cluster can be checked. It will have the errors of : "Error trying to find VM: Unauthorized"

There will be other log entries for unauthorized errors.