Pods fail to initialize and remain stuck in a ContainerCreating state
search cancel

Pods fail to initialize and remain stuck in a ContainerCreating state

book

Article ID: 440552

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

  • Pod events shows below errors:
    "Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded" 
    "Failed to create pod sandbox: rpc error: code = Unknown desc = failed to reserve sandbox name"
    "Pod sandbox changed, it will be killed and re-created
    "

  • Kubelet Logs shows below error: 
    "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to reserve sandbox name"

  • Containerd Logs shows below errors:
    "failed to setup network for sandbox ++ plugin type=\"multus-shim\" name=\"multus-cni-network\" failed (add): CmdAdd (shim): CNI request failed with status 400"
    "error adding container to network \"<network-name>\": error at storage engine: time limit exceeded while waiting to become leader\n'" 

  • Whereabouts Logs shows below errors:
    "leaderelection.go:330] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided"
    "failed to clean up IP for allocations: failed to update the reservation list: the server rejected our request due to an error in our request"

  • Kube-apiserver Logs shows below errors:
    "Unable to authenticate the request" err="[invalid bearer token, service account token has expired]"

Environment

TCA  3.2
TKGm 2.5

Cause

Root cause is the expiration of the Bound Service Account Token mounted inside the Whereabouts CNI pod.
In TKGm 2.5 / Kubernetes 1.22+, Service Account tokens are time-bound and rotated automatically by the Kubelet. If the Whereabouts client fails to reload the new token from disk, it loses access to the Kubernetes API server.
Kubelet is unable to set up the network and aborts the pod sandbox creation.

Resolution

1. Restart the Whereabouts and Multus DaemonSets which will help to pods to mount fresh Service Account tokens instead of node reboot. 
kubectl rollout restart daemonset whereabouts -n kube-system
kubectl rollout restart daemonset kube-multus-ds -n kube-system

2. Monitor the pod creation status with below command 
kubectl get pods -n kube-system -w | grep -E 'whereabouts|multus'

3. Delete the pods which is in stuck state
kubectl delete pod <failing-pod-name> -n <namespace>