Garden and garden-healthchecker jobs fail to start up in time when updating Diego Cells
search cancel

Garden and garden-healthchecker jobs fail to start up in time when updating Diego Cells

book

Article ID: 298355

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

On garden-runc v1.22.6 through v1.28.0, garden restarts with a high container count could result in BOSH deploys failing due to a race condition between garden, bpm, monit, and garden-healthchecker.

During an update you may see this error when deploying TAS

Updating deployment:
  Expected task '1053' to succeed but state is 'error'
Task 1053 | 12:45:32 | Error: 'diego_cell/########-####-####-####-######### (4)' is not running after update. Review logs for failed jobs: garden, garden-healthchecker


On the Diego Cell this error is seen in /var/vcap/sys/log/garden/garden-healthchecker.stderr.log

time="2023-05-03T22:31:59Z" level=error msg="runc run failed: unable to start container process: error during container init: error mounting \"/var/vcap/data/garden/garden.sock\" to rootfs at \"/var/vcap/data/garden/garden.sock\": stat /var/vcap/data/garden/garden.sock: no such file or directory"


You may run into this issue on TAS/Isolation Segment releases that include garden-runc v1.22.6 through 1.28.0. TAS for Windows is not affected

Affected TAS/IST versions:

  • 2.11
    • TAS 2.11.32 - 2.11.37
    • IST 2.11.26 - 2.11.31
  • 2.12
    • TAS 2.12.21 - 2.12.25
    • IST 2.12.16 - 2.12.20
  • 2.13
    • TAS 2.13.14 - 2.13.19
    • IST 2.13.11 - 2.13.16
  • 3.0
    • TAS 3.0.4 - 3.0.9
    • IST 3.0.4 - 3.0.9
  • 4.0
    • TAS 4.0.0
    • IST 4.0.0



There are 2 known workarounds for this issue. These workarounds are temporary and need to be used for any update that restarts garden

  1. Update the stemcell while updating TAS (recreates all VMs so garden would have a fresh state and not have anything to clean up)

  2. Leave the stemcell the same, and check the BOSH Tile -> Director Config -> Recreate VMs deployed by the BOSH Director box when deploying, to accomplish the same as above.



Environment

Product Version: 2.11

Resolution

This issue is fixed in garden-runc v1.29.0

Fixed TAS/IST versions:
  • 2.11
    • TAS 2.11.38
    • IST 2.11.32
  • 2.12
    • TAS 2.12.26
    • IST 2.12.21
  • 2.13
    • TAS 2.13.20
    • IST 2.13.17
  • 3.0
    • TAS 3.0.10
    • IST 3.0.10
  • 4.0
    • TAS 4.0.1
    • IST 4.0.1