TKGm Management Cluster deletion stuck during Kind resources cleanup
search cancel

TKGm Management Cluster deletion stuck during Kind resources cleanup

book

Article ID: 373736

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Plus 1.x

Issue/Introduction

Deleting a Management Cluster with "tanzu mc delete" command gets stuck after the Kind cleanup cluster has been deployed and the Management Cluster resources cleanup has started.

Information on how the cleanup process works is in the official Docs: Delete Management Clusters

Cause

There're many possible causes for this issue.
For example, if Kind can't connect to vCenter/NSX/AVI to proceed with the infra resources cleanup, the Management Cluster cleanup may get stuck with stale ClusterAPI objects, such as Cluster, Machines, vSphereMachines and vSphereVMs.

Resolution

To troubleshoot the issue you can execute into the Kind cleanup cluster and examine Management Cluster resources there.

  1. List the name of the Kind cleanup cluster:
    # kind get nodes -A
  2. Get the Docker container ID:
    # docker ps | grep kind
  3. Execute into the Kind container:
    # docker exec -it <kind-container-id-from-command-2> bash
  4. Once inside the Kind cluster, you can examine the current status of the Management Cluster ClusterAPI resources.
    For example:
    # kubectl get cluster,kcp,md,ma,vspheremachine,vspherevm -A

    We should be able to see if the Management Cluster is stuck in Deleting phase and if any other resource is somehow stuck.

If we need to make changes to the cluster resources' manifests, for example, removing finalizers, "kubectl edit" doesn't work on Kind as it doesn't have any text editor installed.
In this case, the best way forward is:

  1. Export the resource's manifest YAML:
    # kubectl get <resource-type> <resource-name> -n tkg-system -o yaml > <resource-name>.yaml
  2.  Copy it to the jumpbox where Kind was deployed. From the jumpbox:
    # docker cp <container-id>:/<file-path>/<resource-name>.yaml .
  3. Edit the manifest in the jumpbox:
    # vim <resource-name>.yaml
  4. Copy the edited manifest to the Kind container:
    # docker cp ./<resource-name>.yaml <container-id>:/<file-path>/<resource-name>-edited.yaml
  5. Apply the changes in the Kind container:
    # kubectl apply -f <resource-name>-edited.yaml

After the issues have been resolved and the Management Cluster cleaned up, if the "tanzu mc delete" command has timed out or failed, you may end up with a stale Kind cluster that needs to be removed.

To remove the stale Kind cluster, follow the official Docs: Kind Cluster Remains after Deleting Management Cluster