Pods Stuck in Init due to IPAM Pool Exhaustion (Whereabouts)
search cancel

Pods Stuck in Init due to IPAM Pool Exhaustion (Whereabouts)

book

Article ID: 435376

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

 

  • New pods are stuck in Init or ContainerCreating.

  • Kubelet events show FailedCreatePodSandBox with context deadline exceeded.

  • Logs from Multus/Whereabouts indicate no available IP addresses in the pool, even if the actual number of running pods is low.

 

Environment

3.2

Cause

This issue occurs when pods enter a Terminating state but fail to clean up their network resources.

  1. Stale Reservation: Whereabouts (the IPAM) still believes the terminating pod is active and holds the IP reservation in its ClusterWideIPAM custom resource.

  2. Pool Depletion: Because the IPs aren't returned to the pool, new pods cannot request an address.

  3. Timeout: Multus waits for Whereabouts to assign an IP; when Whereabouts fails to find one, the process hits the 2-minute RPC timeout, causing the context deadline exceeded error.

Resolution

Clearing the IPAM State

If the IP pool is exhausted due to stale assignments, the most effective way to force a reconciliation of the IPAM state is to restart the Whereabouts components.

1. Force Clear Stale Pods

Ensure any pods stuck in Terminating are removed.

kubectl delete pod <pod-name> -n <namespace> --force --grace-period=0

 

2. Restart Whereabouts Components

Restarting the Whereabouts pods triggers a refresh of the IP allocation logic and clears the "stale" locks on the IP pool.

kubectl get pods -n <namespace> --no-headers | grep whereabouts | awk '{print $1}' | xargs kubectl delete pod -n <namespace>

 

3. Verify IP Recovery

Check that the IPAM is once again assigning addresses by monitoring the pod logs:

kubectl logs -n <namespace> -l app=whereabouts