Workload Cluster Node Scaling Stuck in ScalingUp State Due to Stale Machine Object
search cancel

Workload Cluster Node Scaling Stuck in ScalingUp State Due to Stale Machine Object

book

Article ID: 412049

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime VMware Tanzu Kubernetes Grid

Issue/Introduction

  • Attempting to scale a workload cluster by directly editing the cluster resource (replicas) does not complete.
  • The MachineDeployment remains in the ScalingUp status
  • A Machine resource is stuck in Deleting or Provisioning even though the VM has already been removed from the vSphere environment.

Environment

TKGm 2.5.1

Cause

When manually editing the cluster resources, the controllers can be unable to reconcile the changes and resources can become stuck.

Resolution

  1. Restart the controllers to trigger reconciliation:

    kubectl -n capv-system rollout restart deployment.apps/capv-controller-manager
    kubectl -n capi-system rollout restart deployment.apps/capi-controller-manager
     
  2. If necessary, manually edit the Machine object and remove the finalizer:

    kubectl edit machine -n <namespace> <name>

    Delete the following lines:

    finalizers:
    - machine.cluster.x-k8s.io
     
  3. If this does not resolve the issue, submit a support request referencing this KB and support can assist in a manual cleanup of any additional resources that may be blocking this reconciliation.



Additional Information

 

  • Using the tanzu cluster scale command is the recommended method for workload cluster scaling, as it ensures the controllers handle the full lifecycle.

  • Direct edits to cluster resources may bypass automated reconciliation, leading to inconsistencies.