How to investigate concourse workers filling up disks
search cancel

How to investigate concourse workers filling up disks

book

Article ID: 297241

calendar_today

Updated On:

Products

Concourse for VMware Tanzu

Issue/Introduction

Customers have reported seeing the persistent disk of their concourse workers filling up. In some cases this appears to be a consequence of the customer running a very large number of pipelines; these pipeline keep resources tied up so that the periodic garbage collection routines are not able to purge the files that these pipelines keep open. 

Generally, performing a bosh recreate of the workers will restore the disk space back to normal. However, this remediation may be burdensome or undesirable. In order to find out what is going on so that the customer and/or R&D can find the root cause, the following data should be collected.

Environment

Product Version: 7.9

Resolution

Collect the following information:

Resources that are called by a pipeline are retained on the workers as long as any pipeline that references that resource is still running. If it is possible to pause all of your pipelines briefly, once a day, this can give concourse the opportunity to purge those resources.

To get a sense of which resources are filling up your disk, try running the following commands on the workers:
 

du -sh /var/vcap/data/worker/work/volumes/live/*/volume | grep G


Then run the following for the volumes that are using the most space:

du -sh /var/vcap/data/worker/work/volumes/live/<high_usage_volume>/volume/*


This will show you the specific resources that are filling up the disk. If you identify which pipelines are using this resource, you can pause those specific pipelines overnight, and the resources should be purged by concourse. The resources are cached for pipelines that are running, but when they paused, then concourse can release them.