Node in not ready state and machine showing failed or stuck in provisioning state.
search cancel

Node in not ready state and machine showing failed or stuck in provisioning state.

book

Article ID: 378405

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

Some times after an unexpected reboot of nodes or the cluster you will see nodes in not ready state. Also the machine resource in TKG might show as failed or are stuck in provisioning state

Environment

2.X

Resolution

In order to resolve the issue remove the machine, vspheremachine and vspherevm resources and let the capi/capv recreate these resources

  1. Delete machine resource

    kubectl delete machine MACHINENAME -n NAMESPACE
  2. Delete vspheremachine resource

    kubectl delete vspheremachine VSPHEREMACHINENAME -n NAMESPACE
  3. Delete vspherevm resource

    kubectl delete vspherevm VSPHEREVMNAME -n NAMESPACE

 

Additional Information

After the removal of the machines from the cli, the nodes may get auto created ( reason being the Machine Health check being enabled).
To synch the exact replicas for the nodes, please edit the cluster configuration from the TCA-M GUI with the correct node count of the replicas and wait for the nodes to get provisioned .