While running Apply Changes on Bosh Director the healhthmonitor fails to start.
The health_monitor logs show that it is exiting at regular intervals, eg 10 or 20 seconds.
I, [2024-07-09T20:18:21.567349 #7] INFO : HealthMonitor starting... I, [2024-07-09T20:18:31.656031 #7] INFO : HealthMonitor exiting! : I, [2024-07-09T20:18:44.626529 #7] INFO : HealthMonitor starting... I, [2024-07-09T20:18:54.813925 #7] INFO : HealthMonitor exiting! : I, [2024-07-09T20:19:07.837434 #6] INFO : HealthMonitor starting... I, [2024-07-09T20:19:17.986888 #6] INFO : HealthMonitor exiting! : I, [2024-07-09T20:19:30.924586 #8] INFO : HealthMonitor starting... I, [2024-07-09T20:19:41.164024 #8] INFO : HealthMonitor exiting! : I, [2024-07-09T20:19:51.455703 #8] INFO : HealthMonitor starting... I, [2024-07-09T20:20:01.527452 #8] INFO : HealthMonitor exiting!
Environment
Bosh Director >= 2.10.73 Bosh Director >= 3.0.29 TAS or TKGi environment with large number of bosh agents.
Cause
The removal of eventmachine from Health Monitor on Bosh director is affecting resource usage in environments that have a large number of bosh agents. https://github.com/cloudfoundry/bosh/pull/2502
Resolution
Manually stopping and starting the health_monitor process will workaround the issue.
Increase the spec of the Bosh director VM to resolve the issue.