Unable to resize Kubernetes cluster worker node pool from VMware Cloud Director

Products

VMware Cloud Director

Issue/Introduction

While trying to resize the worker node pool, we see that the desired node count increases but not the available node count.
No events/tasks are generated in UI during the process of resizing the worker node pool.
Logs from the RDEProjector component indicate missing credentials or secrets for the Kubernetes namespace tkg_cluster_name_namespace, preventing the operation from proceeding:
2025-04-22T11:04:06.750Z ERROR Reconciler error {"controller": "rdeprojector", "controllerGroup": "capvcd.cloud-director.domain.com", "controllerKind": "RDEProjector", "rDEProjector": {"name":"tkg_cluster_name","namespace":"default"}, "namespace": "default", "name": "tkg_cluster_name", "reconcileID": "########-####-####-########", "error": "Error getting client credentials to reconcile Cluster [urn:vcloud:entity:vmware:capvcdCluster:########-####-####-########] infrastructure: error getting secret [capi-user-credentials] in namespace [tkg_cluster_name_namespace]: Secret \"capi-user-credentials\" not found", "errorVerbose": "Secret \"capi-user-credentials\" not found\nerror getting secret [capi-user-credentials] in namespace [tkg_cluster_name_namespace]\gitlab.###.domain.com/core-build/vcd-rde-projector/controllers.getUserCredentialsForCluster\n\t/build_path/vcd-rde-projector/controllers/rdeprojector_controller.go:278\gitlab.###.domain.com/core-build/vcd-rde-projector/controllers.

The same error can be noticed in capvcd logs:
2025-04-22T10:43:33.511Z ERROR Reconciler error {"controller": "vcdcluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "VCDCluster", "VCDCluster": {"name":"tkg_cluster_name","namespace":"tkg_cluster_name_namespace"}, "namespace": "tkg_cluster_name_namespace", "name": "tkg_cluster_name", "reconcileID": "########-####-####-########", "error": "Error creating VCD client to reconcile Cluster [tkg_cluster_name] infrastructure: error getting client credentials to reconcile Cluster [tkg_cluster_name] infrastructure: [error getting secret [capi-user-credentials] in namespace [tkg_cluster_name_namespace]: Secret \"capi-user-credentials\" not found]", "errorVerbose": "error getting client credentials to reconcile Cluster [tkg_cluster_name] infrastructure: [error getting secret [capi-user-credentials] in namespace [tkg_cluster_name_namespace]: Secret \"capi-user-credentials\" not found]\nError creating VCD client to reconcile Cluster [tkg_cluster_name] infrastructure\ngithub.com/vmware/cluster-api-provider-cloud-director/controllers.(*VCDClusterReconciler).reconcileDelete\n\t/build_path/cayman_cluster-api-provider-cloud-director
Reviewing the namespace using the command 'kubectl get ns', it is observed that one of the namespace is stuck in 'Terminating' state:
- Sample output:NAME STATUS AGE
  capi-kubeadm-bootstrap-system Active 55d
  capi-kubeadm-control-plane-system Active 55d
  capi-system Active 55d
  capvcd-system Active 55d
  cert-manager Active 55d
  default Active 55d
  kapp-controller Active 55d
  kapp-controller-packaging-global Active 55d
  kube-node-lease Active 55d
  kube-public Active 55d
  kube-system Active 55d
  rdeprojector-system Active 55d
  secretgen-controller Active 55d
  tkg_cluster_name_namespace Terminating 55d
  tkg-system Active 55d
  tkg-system-public Active 55d
  vmware-system-antrea Active 55d

Environment

VMware Cloud Director 10.x

Container Service Extension 4.x

Cause

This issue is caused by the Kubernetes namespace being stuck in a "Terminating" state, which prevents necessary secrets (such as capi-user-credentials) from being accessed during the resize operation.

Resolution

Note: The cluster in this state cannot be recovered and it is recommended to restore K8 resources to new cluster.

1. Create new cluster using the steps mentioned in the document: Create a Tanzu Kubernetes Cluster

2. Deploy all the backed up k8s resources in the new cluster.

3. Delete the old cluster directly from the VMware Cloud Director Tenant UI.