Node in not ready state and machine showing failed or stuck in provisioning state.
book
Article ID: 378405
calendar_today
Updated On:
Feedback
Subscribe
Products
VMware Telco Cloud Automation
Show More
Show Less
Issue/Introduction
Machine resource in TKG might show as failed or are stuck in provisioning state
Cause
Some times after an unexpected reboot of nodes or the cluster you will see nodes in not ready state.
Due to some application/workload issue, the node(s) were set to not ready state.
The Machine resource is still not provisioning due to a pod not being in a ready status.
Resolution
Remove the machine, vspheremachine and vspherevm resources and let the capi/capv recreate these resources using the following steps:
Identify the machine resource in the tkg management cluster:
kubectl get machine -n NAMESPACE | grep MACHINENAME
Delete machine resource
kubectl delete machine MACHINENAME -n NAMESPACE
NOTE : If step 2 does not auto provision a new machine in running status follow the next steps.
Delete vspheremachine resource
kubectl delete vspheremachine VSPHEREMACHINENAME -n NAMESPACE
Delete vspherevm resource
kubectl delete vspherevm VSPHEREVMNAME -n NAMESPACE
Additional Information
After the removal of the machines from the cli, the nodes may get auto created ( reason being the Machine Health check being enabled). To synch the exact replicas for the nodes, please edit the cluster configuration from the TCA-M GUI with the correct node count of the replicas and wait for the nodes to get provisioned.
Feedback
thumb_up
Yes
thumb_down
No