Bosh deployment fails, caused by a race condition, with the following error indicating cloud_controller_ng, ccng_monit_http_healthcheck, and nginx_cc jobs could not start. Given this is a race condition it is possible these jobs will report success, however you should see at least 1 or 2 of them fail in order to match the symptoms of this knowledge article.
Task 184 | 04:48:43 | L starting jobs: cloud_controller/685460c3-a8c8-4c8c-ae64-70292f62b382 (0) (canary) (00:07:10)
L Error: 'cloud_controller/685460c3-a8c8-4c8c-ae64-70292f62b382 (0)' is not running after update. Review logs for failed jobs: cloud_controller_ng, ccng_monit_http_healthcheck, nginx_cc
Task 184 | 04:53:43 | Error: 'cloud_controller/685460c3-a8c8-4c8c-ae64-70292f62b382 (0)' is not running after update. Review logs for failed jobs: cloud_controller_ng, ccng_monit_http_healthcheck, nginx_cc
/var/vcap/sys/log/cloud_controller_ng/ccng_monit_http_healthcheck.stdout.log log file will report the following log pattern. Please note the log line “Will restart CC over on repeated failures” is normal and should exist when ccng monit healthcheck starts up. The log line that indicates curl failed with exit code 7 is the symptom that matches this bug.
2024-01-05 04:48:44.677964337+00:00 Will restart CC over on repeated failures
2024-01-05 04:48:44.686089362+00:00 ccng_monit_http_healthcheck failed to curl <https://10.225.58.72:9024/healthz>: exit code 7
2024-01-05 04:48:44.687590286+00:00 :: Healthcheck failed consistently, restarting CC
Exit code 7 indicates the curl command failed to reach the local nginx process ( nginx_cc job ) because it received a “Connection Refused” when connecting to port 9024. This occurs because nginx has not started listening on port 9024 yet. Monit is responsible for starting all the jobs and if nginx_cc is started 3 or 4 seconds after ccng_monit_http_healthcheck job then ccng healthcheck will fail and restart cloud_controller_ng as well as nginx_cc job. This cycle may continue indefinitely.