How to force terminate pods that are stuck in terminating state
search cancel

How to force terminate pods that are stuck in terminating state

book

Article ID: 407883

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

If the pod is stuck in terminating state and you cannot kill the stuck container even by running crictl stop. Find and kill shim-runc process manually.

Environment

2.x, 3.x

Cause

When kubelet or crictl tries to stop the container, the RPC call to containerd-shim hangs, leading to a DeadlineExceeded error.

The kubelet and containerd lose control over the container process, but the pod object remains in the API server with a Terminating status because the kubelet cannot confirm its complete termination.

The solution of killing the shim-runc process to bypassing the normal graceful shutdown flow and directly killing the parent process that is managing the container's lifecycle. This forces the container's process to be orphaned and eventually reaped by the init process (PID 1), which frees up the resources and allows the pod to be cleaned up by the kubelet.

 

Resolution

  • Find the container Pod ID using the Pod name

root@cnf-cluster001 [ ~ ]# crictl ps | grep container-id0011-0000000000-hd0001
podid00001111       imageid000001       7 hours ago         Running             nim-cip                  0                   0beeac825e9e3       container-id0011-0000000000-hd0001

  • Inspect the Pod ID and fetch the sandboxID

root@cnf-cluster001 [ ~ ]# crictl inspect podid00001111 | grep sandbox
    "sandboxID": "0000000000000000000000000010011111111111111111111111111111111111",
          "source": "/var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/0000000000000000000000000010011111111111111111111111111111111111/hostname",
          "source": "/var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/0000000000000000000000000010011111111111111111111111111111111111/resolv.conf",
          "source": "/run/containerd/io.containerd.grpc.v1.cri/sandboxes/0000000000000000000000000010011111111111111111111111111111111111/shm",
        "io.kubernetes.cri.sandbox-id": "0000000000000000000000000010011111111111111111111111111111111111",
        "io.kubernetes.cri.sandbox-name": "container-id0011-0000000000-hd0001",
        "io.kubernetes.cri.sandbox-namespace": "cnfnamespace",
        "io.kubernetes.cri.sandbox-uid": "89a16b43-5455-4c86-8a71-6ff2134593b0"

  • Use the sandboxID to fetch the process ID

root@cnf-cluster001 [ ~ ]# ps -aux | grep "0000000000000000000000000010011111111111111111111111111111111111"
root     3801090  0.7  0.0 1236724 13564 ?       Sl   05:52   3:22 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -id 0000000000000000000000000010011111111111111111111111111111111111 -address /run/containerd/containerd.sock

  • Kill the process

root@cnf-cluster001 [ ~ ]# kill 3801090