Monit reports UAA running on BOSH Director when it is actually unhealthy
book
Article ID: 293574
calendar_today
Updated On:
Products
Operations Manager
Issue/Introduction
Symptoms: Monit reports the following:
BOSH Director is failed
CredHub has Failed
UAA is running
UAA logs will show there are problems communicating with the database during startup as well as staying up and running even though it can not service requests.
Environment
Cause
When Monit restarts BOSH Director components, there is a race condition between Postgres and the UAA job which triggers this behavior. If the Postgres instance is not up and available before UAA is ready then we may see this scenario.
Monit will continue to report UAA as running even though the process state is unhealthy. The Monit start scripts for UAA do curl the uaa "/healthz" endpoint to verify UAA is up and running, however the healthz endpoint does not know there is a DB requirement. Monit will detect UAA is healthy even though UAA is stuck and never restart it.
Resolution
Once postgres is up and running, then restarting the uaa process using "monit restart uaa" will recover from this state. This is fixed in Operations Manager 2.8.