In VMware Cloud Foundation 9.x, when a VKS cluster is deleted and immediately recreated using the same namespace and name, the control plane may fail to become ready. The API server remains unreachable, and cluster status logs report an error similar to:
YYYY-MM-DDT HH:MM:SS failed to get server groups: Get "https://##.###.##.##:6443/api?timeout=10s": context deadline exceeded
The newly assigned Virtual IP (VIP) is often different from the previous one, but the cluster remains in a stale state referencing the old configuration.
VMware Cloud Foundation (VCF) 9.0.1, 9.0.2, 9.1.0
vSphere Distributed Switch (VDS) enabled
The issue is caused by stale state maintained by the flb-controller deployment within the Supervisor Cluster. The controller retains VIP allocation data from the deleted cluster instance, leading to a mismatch when the new cluster instance attempts to initialize its control plane.
To resolve this issue, the flb-controller state must be refreshed by restarting the deployment.
Using kubectl, switch context to the affected Supervisor Cluster.
Restart the flb-controller deployment in the vmware-system-flb namespace:
kubectl rollout restart deployment -n vmware-system-flb flb-controller
Ensure the deployment has completed the rollout:
kubectl get pods -n vmware-system-flb -l app=flb-controller
Re-apply the VKS cluster manifest.
To prevent this issue without a controller restart, users are advised to wait several minutes between deletion and recreation, or use a unique name/namespace for the new cluster.