NAMESPACE NAME READY STATUS RESTARTS AGE### ######## 0/22 Terminating 0 5d6h/var/vcap/sys/log/kubelet/kubelet.stderr.log" show errors like (for weblogic pod):E0902 10:01:05.794087 9679 kubelet.go:2032] [failed to "KillContainer" for "weblogic" with KillContainerError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded", failed to "KillPod#######" for "######################################" with KillPod######Error: "rpc error: code = DeadlineExceeded desc = failed to stop container \"######################################\": an error occurs during waiting for container \"######################################\" to be killed: wait container
and (for dynatrace-oneagent pod):E0902 10:01:54.048022 11090 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"KillContainer\" for \"dynatrace-oneagent\" with KillContainerError: \"rpc error: code = DeadlineExceeded desc = an error occurs during waiting for container \\\"\\\" to be killed: wait container \\\"######################################\\\": context deadline exceeded\"" pod="dynatrace/dynakube-oneagent-abc12" podUID="######################################"########-####-####-####-##################
Task 1234 | 15:32:49 | L stopping jobs: worker/########-####-####-####-########1234 (1) (00:03:56) L Error: Action Failed get_task: Task ########-####-####-####-########5678 result: Stopping Monitored Services: Stopping services '[containerd]' errored
containerd_ctl stop" against the terminating container, leaving a stale containerd task:
crictl ps | grep <pod_name> #--------------> Example: crictl ps | grep dynatrace-oneagent

ps -ef | grep containerd-shim
Example showing the related containerd shim process for POD ID tc123dq
ps -ef | grep containerd-shim
root 296150 1 0 Sep09 ? 00:05:30 /var/vcap/data/packages/containerd/78b921b6df42e5acdcefc9d099a31042f680857c/bin/containerd-shim-runc-v2 -namespace k8s.io -id tc123dq97448be4e030270d6073004fd4047c7350797f45a21ce257c0d -address /var/vcap/sys/run/containerd/containerd.sock
Issue observed on TKGI 1.20 and lower versions.
There are known issues with runc-shim and containerd that can cause processes to get hung up. See Git containerd issue 8847
These fixes for runc-shim are included in v1.7.22 and v1.6.36 of containerd.
If you are seeing problems with pods getting stuck in terminating status, then upgrade to TKGI v1.21 or higher, which has containerd v1.7.23.
Using the crictl ps and ps -ef commands listed in the Issue/Introduction section, identify the problem pod's process ID. Once you have determined the process ID, use kill to stop the process in order to allow graceful containerd shutdowns:
kill -9 296150