TKC cluster stuck in creating state with error "exhausted all ip addresses in requested ip pools"
search cancel

TKC cluster stuck in creating state with error "exhausted all ip addresses in requested ip pools"

book

Article ID: 385712

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime VMware vSphere Kubernetes Service

Issue/Introduction

  • When deploying or upgrading a Tanzu Kubernetes Cluster (TKC), the process hangs in a Pending or Creating state.

  • The OVF template may deploy successfully; however, the process fails during network interface initialization.

  • The following error messages can be observed in the netop pod logs on the Supervisor Cluster.

kubectl logs <netop-pod-name> -n vmware-system-netop

EDDMM HH:MM:SS.XXXXXX 1 network interface_controller.go:250] controllers/NetworkInterface/XXXX/control-plane-VM "msg"="error reconciling NetworkINterfWin dows"-"exhausted all ip addresses in requested ip pools"
EDDMM HH:MM:SS.XXXXXX 1 controller.go:317] controller/network interface "msg"="Reconciler error" "error"="exhausted all ip addresses in requested ip pools" "name"="control-plane-VM alecatcontreduBlane -XXXX" "namespace"="XXXX" "reconciler group"="netoperator.vmware.com" "reconciler kind"="NetworkInterface"

Environment

VMware vSphere Kubernetes Service using VDS + AVI Loadbalancer network

Cause

The log entries indicate that the NetworkInterface controller is unable to reconcile the resource because all available IP addresses in the configured IP pools have been exhausted. As a result, no additional IP addresses are available for assignment to newly created control plane or worker nodes.

This issue may also occur due to stale or orphaned NetworkInterface custom resources that remain registered in the cluster following configuration changes, failed deployments, or incomplete cleanup operations. These stale entries can continue to reserve IP addresses, leading to IP pool exhaustion even though the associated virtual machines no longer exist.

To verify whether orphaned NetworkInterface resources are present, compare the output of the following commands:

kubectl get machines -n <VSPHERE_NAMESPACE>
kubectl get networkinterfaces.netoperator.vmware.com -n <VSPHERE_NAMESPACE>

Review the outputs and confirm that each NetworkInterface resource corresponds to an existing Machine object. If NetworkInterface entries exist without a matching Machine resource, they are considered orphaned and could be consuming IP addresses unnecessarily.

Example of networkinterfaces output:

NAME                                      AGE
<CLUSTER_NAME>-worker-node-###-7qpd4      93d
<CLUSTER_NAME>-worker-node-###-8w5lf      55m
<CLUSTER_NAME>-worker-node-###-cx8wz      228d
<CLUSTER_NAME>-worker-node-###-djpfj      166d
<CLUSTER_NAME>-worker-node-###-jbqpl      55m
<CLUSTER_NAME>-worker-node-###-k6p5x      166d

Resolution

The resolution requires either increasing the physical capacity of the IP pool or reclaiming addresses currently held by orphaned objects.

1. Expand IP Capacity

If the cluster has naturally outgrown the initial IP allocation, you must add more addresses to the Workload Network. This ensures that the netoperator has a sufficient buffer for new TKC nodes.

You can check the IP capacity in vcenter (Workload Management → Supervisor Cluster → Configure → Workload Network)

Add workload network to a supervisor cluster vCenter 8.0 

2. Reclaim Stale IP Addresses

If there are stale NetworkInterface objects identified that do not have a corresponding Machine, manually delete the stale objects to release the IP addresses back into the pool.

kubectl delete networkinterface.netoperator.vmware.com <STALE_INTERFACE_NAME> -n <VSPHERE_NAMESPACE>

Additional Information

Refer to the doc for more information about the network requirements to configure the workload networks.