TKG Pod using Multus assigned IP gets stuck in "ContainerCreating" state due to abrupt ESXi host shutdown
search cancel

TKG Pod using Multus assigned IP gets stuck in "ContainerCreating" state due to abrupt ESXi host shutdown

book

Article ID: 406941

calendar_today

Updated On:

Products

VMware Telco Cloud Automation VMware Tanzu Kubernetes Grid Management

Issue/Introduction

  • One of the TKG Pod is stuck in "ContainerCreation" state due to multus IP allocation issue

" ERRORED: error configuring pod [opc-xxx-xxxx/opc-xxx-xxxx-session-cluster-x] networking: [opc-data-udsf/opc-data-xxxxx-session-cluster-x/xxxxxxx-xxxx-xxxx-xxxx-xxxxxxx:xxx-xxxxxxxxx-xxxx-replication]: error adding container to network "xxxx-xxxxxxx-volt-replication": error at storage engine: Could not allocate IP in range: ip: 10.xxx.xx.x / - 10.xxx.xx.x / range: 10.xxx.xx.x/27 / excludeRanges: []"

  • Verified the status of the all the pods 

#kubectl get po -A | grep -v Running            

NAMESPACE             NAME                             READY   STATUS              RESTARTS        AGE 
opc-data-xxxx         opc-xxx-xxxx-session-cluster-x    0/1    ContainerCreating      0            13h

 

Environment

  • VMware Telco Cloud Automation : 3.1.1
  • VMware Tanzu Kubernetes Grid  : 2.5.1 

Cause

  • This is an expected behavior. 
  • When a pod or application using a Multus-assigned IP is terminated abruptly (e.g., due to a host shutdown), the IP is not released gracefully back to the IP pool.
  • Multus relies on a graceful pod shutdown to release the IP. In this case, since the host was powered off, the IP remained allocated.

Resolution

Workaround :

 

  • You can manually release the IP from Multus as follows:

 

Manual IP Release Steps:


1.) Identify the Network Name :

    • Run the following command to find your CNI network name:

# cat /etc/cni/net.d/*.conf

 

2.) Inspect Allocated IPs :

    • List the allocated IPs under the network directory:

     # ls  /var/lib/cni/networks/<network-name>

 

3.) Delete the Stale IP(s) :

    • Remove the specific IP(s) that are no longer in use:

    #sudo rm /var/lib/cni/networks/<network-name>/10.107.34.*

 

After completing above steps, the IP(s) should be available again for allocation, and your pod should be able to receive an IP successfully.