VKS Guest Cluster Deployment Hangs with Avi Error "Unable to acquire IP address"
search cancel

VKS Guest Cluster Deployment Hangs with Avi Error "Unable to acquire IP address"

book

Article ID: 442187

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service VMware Avi Load Balancer

Issue/Introduction

When attempting to deploy a new vSphere Kubernetes Service (VKS) Guest Cluster, the deployment stalls and fails to complete. You may observe the following behaviors across your infrastructure:

  1. VKS / Kubernetes API Timeouts The Cluster API (CAPI) controllers fail to reach the Kubernetes API Virtual IP (VIP). Reviewing the VKS deployment logs reveals timeouts indicating a dropped connection to the load balancer:

"Failed to update kube-proxy daemonset" err="failed to determine if kube-proxy daemonset already exists: Get \https://<VIP_IP>:6443/apis/apps/v1/namespaces/kube-system/daemonsets/kube-proxy?timeout=10s\: call timeout expired - error from a previous attempt: http2: client connection lost" controller="kubeadmcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="KubeadmControlPlane" KubeadmControlPlane="<NAMESPACE>/<CLUSTER_NAME>" namespace="<NAMESPACE>" name="<CLUSTER_NAME>" reconcileID="<RECONCILE_ID>" Cluster="<NAMESPACE>/<CLUSTER_NAME>" E0518 08:56:47.583827       1 controller.go:347] "Reconciler error" err="failed to determine if kube-proxy daemonset already exists: Get \https://<VIP_IP>:6443/apis/apps/v1/namespaces/kube-system/daemonsets/kube-proxy?timeout=10s\: call timeout expired - error from a previous attempt: http2: client connection lost"

  1. Avi Controller Virtual Service Placement Failures Within the NSX Advanced Load Balancer (Avi) UI, the Virtual Service (VS) automatically generated for the new cluster fails to place on the existing Service Engines (SEs). The UI displays the following critical faults:
  • "Service Engine [SE_Name] requires access to subnet [Subnet_IP/Mask] in the network to reach server."
  • "Unable to acquire IP address for network."
  1. Successful Direct Control Plane Access (Diagnostic Validation) To isolate the fault domain, bypassing the load balancer and connecting directly to the internal Control Plane VM is successful. Executing curl -v -k https://<VM_INTERNAL_IP>:6443/version from a node on the same network returns a valid Kubernetes API response, confirming the VKS components are healthy.

Environment

  • VMware vSphere Kubernetes Service (VKS)
  • NSX Advanced Load Balancer (Avi)

Cause

This issue is caused by an IP allocation failure at the load balancer layer. The existing Service Engines within the designated Service Engine Group (SEG) are unable to acquire an IP address from the configured subnet.

Because the Service Engine cannot obtain a valid IP, the Virtual Service (which acts as the front-end VIP for the cluster API) fails to place. Consequently, the VKS deployment engine times out waiting for the Kubernetes API to become reachable via the load balancer, halting the cluster creation process.

Resolution

Workaround:

To immediately bypass the IP acquisition failure on the existing Service Engines and unblock the VKS deployment, you must create a new Service Engine Group. This forces the Avi Controller to instantiate a fresh Service Engine.

Step-by-Step Implementation:

  1. Log in to the NSX Advanced Load Balancer (Avi) Controller
  2. Navigate to Infrastructure > Service Engine Group.
  3. Create a New Service Engine Group (SEG) with the appropriate configuration for your environment.
  4. Navigate to Applications > Virtual Services.
  5. Locate the failing Virtual Service associated with your stalled VKS deployment and click Edit.
  6. Reassign the Virtual Service to the newly created SEG and save your changes.
  7. Monitor the Avi UI to verify that a new Service Engine spawns, successfully acquires an IP address, and places the Virtual Service.

Once the Virtual Service is successfully placed, the VKS deployment will automatically resume and complete.

Additional Information

  • Do not revert to the old group: Attempting to move the Virtual Service back to the original Service Engine Group, or simply rebooting the existing Service Engines, will likely result in a recurrence of the placement failure.
  • Permanent Resolution: The workaround above restores operational continuity for VKS. To determine the root cause of the IP exhaustion or network allocation failure on the original Service Engine Group, please submit a new Service Request with the Broadcom NSX Advanced Load Balancer (Avi) Support team.