TKG Workload clutser stuck in deleting state
search cancel

TKG Workload clutser stuck in deleting state

book

Article ID: 398775

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime

Issue/Introduction

  • Workload cluster deletion stuck in "deleting state" without any progress
  • Deleting cluster and re-creating the cluster fails

 

Environment

  • TCP Version : 5.0 
  • TCA Version : 3.x
  • TKG Version : 1.24.10

 

Cause

  • Cluster API (CAPI) resources like Machine, MachineDeployment, or Cluster may have finalizers waiting for child resources to be deleted.

Resolution

  • Terminated CNF's and CRDs
     

 kind: VSphereCluster
    name: ci-xbk-5g-em-01
    namespace: ci-xbk-5g-em-01
status:
  conditions:
  - lastTransitionTime: "2025-05-01T15:41:37Z"
    message: Rolling 3 replicas with outdated spec (1 replicas up to date)
    reason: RollingUpdateInProgress
    severity: Warning
    status: "False"
    type: Ready
  - lastTransitionTime: "2022-11-02T19:16:22Z"
    status: "True"
    type: ControlPlaneInitialized
  - lastTransitionTime: "2025-05-01T15:41:37Z"
    message: Rolling 3 replicas with outdated spec (1 replicas up to date)
    reason: RollingUpdateInProgress
    severity: Warning
    status: "False"
    type: ControlPlaneReady
  - lastTransitionTime: "2024-08-22T20:45:00Z"
    status: "True"
    type: InfrastructureReady
  - lastTransitionTime: "2024-07-30T15:26:12Z"
    reason: AlreadyUpToDate
    severity: Info
    status: "False"
    type: UpdatesAvailable
  controlPlaneReady: true
  infrastructureReady: true
  observedGeneration: 7
  phase: Deleting

# kubectl get cluster  -A

NAMESPACE           NAME                           PHASE            AGE      VERSION

ci-xbk-5g-cces-01   ci-xbk-5g-cces-01              Deleting         2y113d
ci-xbk-5g-em-01     ci-xbk-5g-em-01                Deleting         2y185d
ci-xbk-5g-pcg-01    ci-xbk-5g-pcg-01               Deleting         581d
tkg-system          bknl-ci-vmw-caas-mgmt1         Provisioned      2y274d

  • Check and verify if any tasks are paused by using the follow command 

# kubectl get cluster <CLUSTER_NAME> -n NAMESPACE_NAME -o yaml | grep -i pause

  • Try to delete it from CLI using the following command 

# kubectl delete tcaNodePool {nodepool_name} -n {cluster_namespace}

  • Restart the capi pod on TKG management cluster and delete the workload cluster.

     
    •  Restarting capi pod :

      # kubectl rollout restart deployment/cluster-controller-manager -n capi-system

    •   Deleting workload cluster : 

      # tanzu cluster delete my-workload-cluster