Zombie process causes PID exhaustion in Tanzu Kubernetes Grid Integrated Edition
search cancel

Zombie process causes PID exhaustion in Tanzu Kubernetes Grid Integrated Edition

book

Article ID: 345707

calendar_today

Updated On:

Products

VMware

Issue/Introduction

Symptoms:
Zombie processes exhausted PID resources, which makes the node not available.
ps -ef | grep defunct
builder 2178 26650 0 05:41 ? 00:00:00 [healthcheck.sh] <defunct>
builder 2200 26260 0 05:41 ? 00:00:00 [healthcheck.sh] <defunct>
builder 3076 26482 0 05:41 ? 00:00:00 [healthcheck.sh] <defunct>
builder 3484 6461 0 Sep23 ? 00:00:00 [sh] <defunct>
root 3876 6461 0 Sep26 ? 00:00:00 [sh] <defunct>
root 4054 17646 0 Sep16 ? 00:00:00 [curl] <defunct>
builder 4552 26650 0 05:42 ? 00:00:00 [healthcheck.sh] <defunct>
root 4714 17646 0 Sep26 ? 00:00:00 [curl] <defunct>
root 5335 17646 0 Sep16 ? 00:00:00 [curl] <defunct>
root 6438 17646 0 Sep15 ? 00:00:00 [curl] <defunct>

Content of sample script
 
#!/bin/bash

echo "Executing Healthcheck on $HOSTNAME"
#Localhost check
curl -k -s -f https://localhost:3000/ping --max-time 180 --connect-timeout 30
error=$?
echo "Executing Healthcheck curl -k -s -f https://localhost:3000/ping on $HOSTNAME: Status($error)"
if [ $error -ne 0 ]; then
exit 1
fi
echo "Finished Executing Healthcheck on $HOSTNAME"
exit 0


Cause

In Kubernetes < 1.20, readiness probes using exec wait till the command exited. There was no timeout on the readiness probe, or the timeout was not honored. This is K8s doc link . This means that the readiness probe will wait till all the  command including curl and netcat gives a response or times out and finally the script will exit.  

In newer versions of Kubernetes, the timeoutSeconds of the exec probe is honored and the timeout defaults to 1 sec. This means that after 1 sec, the readiness probe will be killed. But the script that may have been started by the probe still be running as an orphan because commands like curl may take 30 seconds to finish. When the readiness probe script is executed at an interval of 30 seconds, a lot of orphans are created which on exit moves to a zombie state as the readiness process has already been killed.

Resolution

Multiple approaches are applicable

1) Use a tiny init system in the pod so that the zombie processes can be reaped.
2) Set the
timeoutSeconds to a reasonable amount so that system does not kill the readiness probe and waits till the script exits.


Additional Information

Impact/Risks:
A lot of zombie processes exhausted PID resources and node is not available.