Tanzu Kubernetes Cluster (TKC) stuck in updating state with Ready False due to machine object stuck in deleting state
search cancel

Tanzu Kubernetes Cluster (TKC) stuck in updating state with Ready False due to machine object stuck in deleting state

book

Article ID: 431194

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Management

Issue/Introduction

  • When checking the status of a Tanzu Kubernetes Cluster (TKC), the cluster reports Ready False and is indefinitely stuck in an updating state.
  • kubectl describe tkc shows the cluster is updating, with an excess of worker nodes (e.g., 7/6 Worker Nodes healthy), indicating a stalled rolling update.
  • kubectl get nodes shows an older node permanently in SchedulingDisabled status.
  • kubectl get machine shows a corresponding machine object stuck in a deleting state.

Environment

TKGm: 2.x, 2.1

Cause

During a cluster update or scale-down, a Cluster Machine object can become stuck in a deleting state if its finalizer is not successfully processed and removed by the underlying controller. This prevents the cluster from completing the rolling update, leaving it in an updating phase.

Resolution

To resolve the issue, manually remove the finalizer from the stuck machine object to allow the Kubernetes garbage collector to remove it.

  1. Log in to the Supervisor cluster or Management cluster and set the context to the namespace where the TKC resides.
  2. Identify the stuck machine:
    kubectl get machine
  3. Edit the stuck machine object and remove the finalizer. You can do this quickly by patching the object:
    kubectl patch machine <machine-name> -p '{"metadata":{"finalizers":[]}}' --type=merge
    (Alternatively, use kubectl edit machine <machine-name> and manually delete the finalizers block under metadata.)
  4. Verify that the machine object has been deleted:
    kubectl get machine
  5. Verify the TKC status has returned to normal:
    kubectl get tkc

The cluster should now complete its reconciliation and show Ready True.