On garden-runc v1.22.6 through v1.28.0, garden restarts with a high container count could result in BOSH deploys failing due to a race condition between garden, bpm, monit, and garden-healthchecker.
During an update you may see this error when deploying TAS
Updating deployment: Expected task '1053' to succeed but state is 'error' Task 1053 | 12:45:32 | Error: 'diego_cell/########-####-####-####-######### (4)' is not running after update. Review logs for failed jobs: garden, garden-healthchecker
On the Diego Cell this error is seen in /var/vcap/sys/log/garden/garden-healthchecker.stderr.log
time="2023-05-03T22:31:59Z" level=error msg="runc run failed: unable to start container process: error during container init: error mounting \"/var/vcap/data/garden/garden.sock\" to rootfs at \"/var/vcap/data/garden/garden.sock\": stat /var/vcap/data/garden/garden.sock: no such file or directory"
You may run into this issue on TAS/Isolation Segment releases that include garden-runc v1.22.6 through 1.28.0. TAS for Windows is not affected
Affected TAS/IST versions:
There are 2 known workarounds for this issue. These workarounds are temporary and need to be used for any update that restarts garden
Update the stemcell while updating TAS (recreates all VMs so garden would have a fresh state and not have anything to clean up)