Deleting a Tanzu Kubernetes Grid Integrated Edition cluster with "tkgi delete-cluster" stuck "in progress" status
search cancel

Deleting a Tanzu Kubernetes Grid Integrated Edition cluster with "tkgi delete-cluster" stuck "in progress" status

book

Article ID: 298683

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

Deleting a cluster in Tanzu Kubernetes Grid Integrated Edition (TKGI) hangs with the status, "in progress". On a large-scale TKGI foundation, cluster deletion remains stuck for a long time.

Through the TKGI API VM, you observe the ncp_cleanup script spawned continuously for the same cluster similar to the following:
ps -ef |grep "ncp_cleanup"

root     13806 13308  0 14:16 ?        00:00:00 /bin/bash /var/vcap/jobs/pks-nsx-t-osb-proxy/bin/ncp_cleanup 504423c9-bed4-4c41-8c9a-2561ece22e0f  true
root     13916 13308  0 14:18 ?        00:00:00 /bin/bash /var/vcap/jobs/pks-nsx-t-osb-proxy/bin/ncp_cleanup 504423c9-bed4-4c41-8c9a-2561ece22e0f  true
root     14187 13308  0 14:21 ?        00:00:00 /bin/bash /var/vcap/jobs/pks-nsx-t-osb-proxy/bin/ncp_cleanup 504423c9-bed4-4c41-8c9a-2561ece22e0f  true
root     14606 13308  0 14:28 ?        00:00:00 /bin/bash /var/vcap/jobs/pks-nsx-t-osb-proxy/bin/ncp_cleanup 504423c9-bed4-4c41-8c9a-2561ece22e0f  true
root     15052 13308  0 14:36 ?        00:00:00 /bin/bash /var/vcap/jobs/pks-nsx-t-osb-proxy/bin/ncp_cleanup 504423c9-bed4-4c41-8c9a-2561ece22e0f  true
root     15283 13308  0 14:39 ?        00:00:00 /bin/bash /var/vcap/jobs/pks-nsx-t-osb-proxy/bin/ncp_cleanup 504423c9-bed4-4c41-8c9a-2561ece22e0f  true
root     15484 13308  0 14:41 ?        00:00:00 /bin/bash /var/vcap/jobs/pks-nsx-t-osb-proxy/bin/ncp_cleanup 504423c9-bed4-4c41-8c9a-2561ece22e0f  true
root     15592 13308  0 14:43 ?        00:00:00 /bin/bash 


Environment

Product Version: 1.11

Resolution

For each he TKGI API VM, perform the following steps:

1. Create a backup of the script, /var/vcap/jobs/pks-nsx-t-osb-proxy/bin/ncp_cleanup.

2. Comment out the following code in /var/vcap/jobs/pks-nsx-t-osb-proxy/bin/ncp_cleanup. Replace the parameters in the command with the appropriate values from your environment.
pksnsxcli cleanup \
  --nsx-manager-host='60.0.0.2' \
  -c $nsx_manager_client_cert_file \
  -k $nsx_manager_client_key_file \
  --nsx-ca-cert-path=$nsx_manager_ca_cert_file \
  --insecure='false' \
  --cluster "$k8s_cluster_name" \
  --t0-router-id="$t0_router_id" \
  --pks=false \
  --read-only=false \
  --force=$force_delete
3. Wait for 11 minutes to see whether the cluster has been deleted successfully.
4. Revert the changes made to the script, /var/vcap/jobs/pks-nsx-t-osb-proxy/bin/ncp_cleanup, in step 2 with the backup created in step 1. 

5. Confirm the NSX-T resources are deleted successfully by searching the cluster UUID on NSX manager.

If there are remaining resources, use this command to manually run the script to delete the resources:
/bin/bash /var/vcap/jobs/pks-nsx-t-osb-proxy/bin/ncp_cleanup 504423c9-bed4-4c41-8c9a-2561ece22e0f true