vRealize Health Service is failing to start in 7.4 and earlier
search cancel

vRealize Health Service is failing to start in 7.4 and earlier

book

Article ID: 319626

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

The steps described in the article will clear out the configuration of the health service and restart it. In most cases this repairs the service and allows it to come up.

Symptoms:
  • The health service under Administration->Health is showing a blank page or a server error.


Environment

VMware vRealize Automation 7.3.x
VMware vRealize Automation 7.4.x

Cause

  • This issue is caused by a problem in the Xenon / Lucene cluster on which the health service runs. The nodes of the service should join each other and share data, but sometimes a corruption will prevent this convergence from happening.

Resolution

In versions of vRealize Automation 7.5 and above the Lucene subsystem has been moved to Postgres which is more resilient to data corruption.

For 7.4 and earlier, the workaround to this issue is to clean up the Health Service sandbox and restart the service.

Note: Cleaning Health Service sandbox will delete all pre-configured tests.

Workaround:
The below mentioned steps need to be completed on each node before moving to the next step:
  1. Stop the Health Service monitor by commenting out the cron job in /etc/cron.d/monitor-vrhb-cron
  2. Kill any instances of the monitor that might be running:
ps -A | grep monitor-vrhb.sh | awk '{print $1}' | xargs --no-run-if-empty kill -9 $1
  1. Stop the Health Service
service vrhb-service stop
  1. Verify the service is stopped, if a process is found kill it manually.
ps aux | grep Quorum
  1. Cleanup up the Health Service datastores (aka Sandboxes)
    rm -r /var/lib/vrhb/service-host/sandbox
rm -r /var/lib/vrhb/vra-tests-host/sandbox
  1. Restart the Health Service
service vrhb-service start
  1. Re-enable the Health Service Monitor by uncommenting the cron job in /etc/cron.d/monitor-vrhb-cron