Dynatrace oneagent injects itself into processes on each BOSH deployed VM. It was observed in this case that Grafana job in healthwatch deployment would failed to start due to dynatrace oneagent.
BOSH deploy failed with error:
[2025-01-27T20:36:09.391707 #2753507] [canary_update(grafana/<redacted>(0))] ERROR -- DirectorJobRunner: Error updating canary instance: #<Bosh::Director::AgentJobNotRunning: 'grafana/<redacted> (0)' is not running after update. Review logs for failed jobs: grafana>
No errors were observed in grafana logs, however the grafana bpm.log shows a very long delay in starting the process. In example below, we see greater than 30 second delay in process start.
{"timestamp":"2025-01-28T20:34:33.147164391Z","level":"info","source":"bpm","message":"bpm.start.start-process.starting","data":{"job":"grafana","process":"grafana","session":"1.2"}}
...
{"timestamp":"2025-01-28T20:35:06.113521278Z","level":"info","source":"bpm","message":"bpm.start.releasing-lifecycle-lock.complete","data":{"job":"grafana","process":"grafana","session":"1.3"}}
The Dynatrace oneagent appears to have introduce signficant latency in starting of Grafana.
The Dynatrace Oneagent needs to be excluded from deployment that it is causing to fail.
Refer to documentation: https://docs.vmware.com/en/Dynatrace-Full-Stack-Add-on-for-VMware-Tanzu/services/dynatrace-fullstack-addon-vmware-tanzu/installing.html
Perform the steps -
bosh-runtime-config > runtime.cfg
$ vim runtime.cfg
exclude:
deployments:
<failed deployment>
bosh update-runtime-config runtime.cfg