Workload Cluster Node Scaling Stuck in ScalingUp State Due to Stale Machine Object

search cancel

book

calendar_today

Tanzu Kubernetes Runtime VMware Tanzu Kubernetes Grid

Attempting to scale a workload cluster by directly editing the cluster resource (replicas) does not complete.
The MachineDeployment remains in the ScalingUp status
A Machine resource is stuck in Deleting or Provisioning even though the VM has already been removed from the vSphere environment.

TKGm 2.5.1

When manually editing the cluster resources, the controllers can be unable to reconcile the changes and resources can become stuck.

Restart the controllers to trigger reconciliation:

kubectl -n capv-system rollout restart deployment.apps/capv-controller-manager
kubectl -n capi-system rollout restart deployment.apps/capi-controller-manager
If necessary, manually edit the Machine object and remove the finalizer:

kubectl edit machine -n <namespace> <name>

Delete the following lines:

finalizers:
- machine.cluster.x-k8s.io
If this does not resolve the issue, submit a support request referencing this KB and support can assist in a manual cleanup of any additional resources that may be blocking this reconciliation.

Using the tanzu cluster scale command is the recommended method for workload cluster scaling, as it ensures the controllers handle the full lifecycle.
Direct edits to cluster resources may bypass automated reconciliation, leading to inconsistencies.

thumb_up Yes

thumb_down No