Monit reports UAA running on BOSH Director when it is actually unhealthy
search cancel

Monit reports UAA running on BOSH Director when it is actually unhealthy

book

Article ID: 293574

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

Symptoms:
Monit reports the following:
  • BOSH Director is failed
  • CredHub has Failed
  • UAA is running
UAA logs will show there are problems communicating with the database during startup as well as staying up and running even though it can not service requests.

Environment


Cause

When Monit restarts BOSH Director components, there is a race condition between Postgres and the UAA job which triggers this behavior. If the Postgres instance is not up and available before UAA is ready then we may see this scenario.  

Monit will continue to report UAA as running even though the process state is unhealthy. The Monit start scripts for UAA do curl the uaa "/healthz" endpoint to verify UAA is up and running, however the healthz endpoint does not know there is a DB requirement. Monit will detect UAA is healthy even though UAA is stuck and never restart it.

Resolution

Once postgres is up and running, then restarting the uaa process using "monit restart uaa" will recover from this state. This is fixed in Operations Manager 2.8.