Massive `grootfs` processes are observed on diego cell
search cancel

Massive `grootfs` processes are observed on diego cell

book

Article ID: 412558

calendar_today

Updated On:

Products

VMware Tanzu Application Service

Issue/Introduction

`grootfs clean` process is triggered by the garden on the Diego cell to clean up unused image files; only one process should be running at a single time. However, it is observed that growing `grootfs. clean` processes and extremely high CPU & memory resources are consumed by those processes. 

...
1 ####02 ####094   ####31 ?             -1 Sl       0   0:00 /var/vcap/packages/grootfs/bin/grootfs --log-file /var/vcap/sys/log/garden/groot.clean.log --log-level info --log-timestamp-format rfc3339 --store /var/vcap/data/grootfs/store/unprivileged --metron-endpoint 127.0.0.1:3457 --tardis-bin /var/vcap/packages/grootfs/bin/tardis --newuidmap-bin /var/vcap/packages/garden-idmapper/bin/newuidmap --newgidmap-bin /var/vcap/packages/garden-idmapper/bin/newgidmap clean --threshold-bytes 0
...

Environment

TPCF 6.x

Cause

The root cause is not identified, but the direct cause is how `grootfs` handles lock. 

  1. `grootfs clean` process is triggered on diego_cell to clean up the file system every few seconds to minutes.
  2. It fetches the lock by using the file /var/vcap/data/grootfs/store/unprivileged/locks/global-groot-lock.lock; the lock can prevent multiple `grootfs clean`processes from running at the same time. 
  3. The problem is when one `grootfs clean` keeps running infinitely and never releases the lock, because more `grootfs clean` processes will be created by the garden, and all have to wait for the lock.
  4. It results in a massive `grootfs clean`process, accordingly, and very high CPU/memory use on the Diego cell. 

Resolution

This issue is fixed in TPCF v6.0.21+ or v10.2.4+.

In case a temporary workaround is pursued, here are the procedures below:

  1. Find the exact grootfs process ID with `fuser -v/var/vcap/data/grootfs/store/unprivileged/locks/global-groot-lock.lock` 
  2. Terminate the process with `kill PID` (or `kill -9 PID` if `kill PID` does not work)
  3. Confirm grootfs processes count decline to 0. 

In case the above procedures do not work, try the following two commands:

1. First, filter out processes with this command. 

ps -ef | grep "/var/vcap/packages/garden-idmapper/bin/newgidmap clean" | grep -v grep | awk '{print $2}'

2. Then kill those processes once confirmed. 

ps -ef | grep "/var/vcap/packages/garden-idmapper/bin/newgidmap clean" | grep -v grep | awk '{print $2}' | xargs kill