Customers are using Diego cells not deployed by Elastic Runtime, such as with Isolation segments or using OSS deployment.
Running df -i reports inode usage of 100%. (or high inode utilization)
Diego deployment manifest should have cleanup_process_dirs_on_wait: true
:
/var/tempest/workspaces/default/deployments/cf-b726f387316441065827.yml: garden: cleanup_process_dirs_on_wait: true
This flag --cleanup-process-dirs-on-wait
should be on garden when it starts:
/var/vcap/data/jobs/garden/4456fe41ab6291aefe82ef966103d435676f45ca/bin/garden_ctl: --cleanup-process-dirs-on-wait \
You should see this flag --cleanup-process-dirs-on-wait
on gdn process when started :
ps -ef. | grep -i gdn root 514382 514381 2 Nov18 ? 14:24:19 /var/vcap/packages/guardian/bin/gdn server --skip-setup --bind- ... --cleanup-process-dirs-on-wait
If this is not set then deployment manifest should be updated to include: cleanup_process_dirs_on_wait: true
.
Error Message:
Application crashes with the following error:
runc exec: exit status 1: exec failed: open /var/vcap/data/garden/depot/... .../.pidfile: No space left on device
A new garden boolean cleanup_process_dirs_on_wait was introduced in the release: https://github.com/cloudfoundry/garden-runc-release/tree/v1.5.0 - this flag by default is set to false unless explicitly set in deployment. This option being disabled will leave behind stale directories which eventually lead to inodes being exhausted.
Note: Versions of Elastic Runtime that are lower than 1.10.12 will not have this boolean as it uses older than 1.5.0 garden release. (these systems will not be affected by this problem) Refer to release notes for Garden versions packaged with ERT: https://docs.pivotal.io/pivotalcf/1-10/pcf-release-notes/runtime-rn.html
It will be necessary to update deployment manifest with boolean cleanup_process_dirs_on_wait
For example:
vi /var/tempest/workspaces/default/deployments/
p-isolation-segment-XXXX.yml:
garden:
cleanup_process_dirs_on_wait: true
Note: that deployment manifest may vary depending what type of manifest has deployed garden. You should check all manifests for garden and verify that they have cleanup_process_dirs_on_wait set to "true".
Once the boolean value is set then execute `bosh deploy <deployment name>` in order to implement the change.
Another option is to bosh recreate
Diego cells periodically until the fix is available.
Please note if you make any changes to the configuration in Ops Manager, this will overwrite manual changes to deployment files.
This issue is fixed in an 2.0.x+ releases of PCF Isolation Segment.