Cannot delete Kubernetes Container cluster within Cloud Director
search cancel

Cannot delete Kubernetes Container cluster within Cloud Director

book

Article ID: 325672

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

Symptoms:
  • Deleting a TKG cluster within Cloud Director gives the following error and never deletes:

error deleting resources by rdeId: [urn:vcloud:entity:vmware:capvcdCluster:111f1c11-c11b-11f0-b8b1-e11110d1110a], [error deleting resources by rde ID: [urn:vcloud:entity:vmware:capvcdCluster:111f1c11-c11b-11f0-b8b1-e11110d1110a] after [10] retry attempts: [error occurred deleting L4 loadbalancer for rde [podinfo-test(urn:vcloud:entity:vmware:capvcdCluster:111f1c11-c11b-11f0-b8b1-e11110d1110a)]: [virtual service [podinfo-test-urn:vcloud:entity:vmware:capvcdCluster:111f1c11-c11b-11f0-b8b1-e11110d1110a-tcp] is busy]. [0] remaining retry attempts]]

  • In the cse.log , the following entries are seen: 
E0111 12:17:21.341433   17639 gateway.go:1049] Virtual service [tkg3-124-urn:vcloud:entity:vmware:capvcdCluster:111f1c11-c11b-11f0-b8b1-e11110d1110a-tcp] is still being configured. Virtual service status: [REALIZATION_FAILED]
E0111 12:17:21.341476   17639 gateway.go:1788] delete virtual service failed; virtual service [tkg3-124-urn:vcloud:entity:vmware:capvcdCluster:111f1c11-c11b-11f0-b8b1-e11110d1110a-tcp] is busy: [virtual service [tkg3-124-urn:vcloud:entity:vmware:capvcdCluster:111f1c11-c11b-11f0-b8b1-e11110d1110a-tcp] is busy]
{"level":"error","ts":"2024-01-11T12:17:21.341Z","caller":"cluster/clusterManager.go:821","msg":"error occurred deleting L4 loadbalancer for rde [tkg3-124(urn:vcloud:entity:vmware:capvcdCluster:111f1c11-c11b-11f0-b8b1-e11110d1110a)]: [virtual service [tkg3-124-urn:vcloud:entity:vmware:capvcdCluster:111f1c11-c11b-11f0-b8b1-e11110d1110a-tcp] is busy]. [33] remaining retry attempts","workerID":"1f2d3acd-f456-7c8c-accb-aa9d01aa234f","stacktrace":"gitlab.eng.vmware.com/core-build/vcd-k8s-provider/src/cluster.DeleteResourcesByRDEId\n\t/app/src/cluster/clusterManager.go:821\ngitlab.eng.vmware.com/core-build/vcd-k8s-provider/src/cluster.DeleteWithoutScript\n\t/app/src/cluster/clusterManager.go:1110\ngitlab.eng.vmware.com/core-build/vcd-k8s-provider/src/cluster.DeleteCluster\n\t/app/src/cluster/clusterManager.go:429\nmain.processRDE\n\t/app/main.go:691"}


Environment

VMware Cloud Director 10.x

Cause

This is a known issue which is caused due to the network configuration that was used with the failed cluster deployments. If NSX-T is not used, CSE is unable to find edge gateway reference to delete L4 loadbalancer components which would result in the error.

Resolution

This issue is resolved in CSE 4.1 versions, available at VMware Downloads.

Workaround:
To workaround the issue, carry out the following steps :

1. Delete existing vApps associated with the failed clusters (they should have the same name as the cluster, if any vApp remained).

2. Double check to ensure the Load Balancer components (virtual service, LB pool members) are not present for the cluster to delete. They are prefixed with the cluster name.

3. In the UI, click on the cluster to see more information, the URL contains the `clusterId ` at the end, in the format of:
urn:vcloud:entity:vmware:capvcdCluster:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

image.png

4. For each cluster, execute the following API requests (using the cluster ID noted from Step 3) in the following order to remove the cluster from the UI.

a. POST {{vcd-ip-or-fqdn}}/cloudapi/1.0.0/entities/{{clusterId}}/resolve
b. DELETE {{vcd-ip-or-fqdn}}/cloudapi/1.0.0/entities/{{clusterId}}