VMware Identity Manager (vIDM / WSA) opensearch service will not start.
book
Article ID: 315176
calendar_today
Updated On:
Products
VMware Aria Suite
Issue/Introduction
Opensearch service will not start.
Running remediate or other vIDM requests through LCM may fail on the health check.
In ssh session to the IDM nodes, we see that opensearch fails to start.
/etc/init.d/opensearch status
Not running
/etc/init.d/horizon-workspace status
RUNNING as PID=_____
Environment
VMware Identity Manager 3.3.7
Cause
This may be caused by stale liquibase lock.
Resolution
Resolution
First confirm that opensearch is Not Running but horizon-workspace is Running:
/etc/init.d/opensearch status
/etc/init.d/horizon-workspace status
Try to simply restart opensearch:
/etc/init.d/opensearch restart
^ If it spends minutes Waiting for IDM then you can kill it with Ctrl+C
A common cause of this issue is an inability to secure the lock. This can be caused by an unclean restart of opensearch for example.
Step 2 only needs to be executed once for the cluster. The remaining steps
Make sure Opensearch service is stopped on all nodes: /etc/init.d/opensearch stop
Release locks (once for the cluster is enough - run on psql primary node) /usr/sbin/hznAdminTool liquibaseOperations -forceReleaseLocks
Restart the main vIDM service - first on primary, wait a minute or two, then the other two nodes: service horizon-workspace restart
Start opensearch on all nodes: /etc/init.d/opensearch start
Workaround: if forceReleaseLocks fails
If the hznAdminTool command above hangs and does not complete, there may be another lock which must be manually removed:
First confirm cluster health as per KB 367175: if hznAdminTool gives error "The connection attempt failed", this can indicate that the delegateIP needs to be assigned to the psql primary node on eth0:0.
Make sure Opensearch service is stopped on all nodes: /etc/init.d/opensearch stop
Log in to the DB on psql primary node with this command: sudo -u postgres psql -h localhost -U horizon saas
Check for a lock here: select * from saas.DatabaseChangeLogLock;
If there is a lock found above (t, with some date & IP address), remove it like so: update saas.DATABASECHANGELOGLOCK SET LOCKED=false, LOCKGRANTED=null, LOCKEDBY=null where ID=1;
Log out of the database with \q and issue steps 2,3,4 above: release liquibase locks, restart horizon-workspace and then start opensearch.
(Note: for versions of vIDM earlier than 3.3.7, replace opensearch with elasticsearch wherever mentioned. These older versions are now EOL.)
Additional Information
Impact/Risks:
Brief service restart. If the vIDM is serving users in terms of login, there may be a momentary disconnect.
Health Status: curl http://localhost:9200/_cluster/health?pretty=true Green: everything is good, there are enough nodes in the cluster to ensure at least 2 full copies of the data spread across the cluster. Yellow: functioning, but there are not enough nodes in the cluster to ensure HA (eg, a single node cluster will always be in the yellow state because it can never have 2 copies of the data). *for single Node - Elasticsearch/Opensearch will be yellow for a single node by default as it doesn't have a cluster. for single node its expected and it should not be a problem, if facing no issue in functionality.
Red: broken, unable to query existing data or store new data, typically due to not enough nodes in the cluster to function or out of disk space.