VMware Identity Manager (vIDM / WSA) opensearch service will not start.
search cancel

VMware Identity Manager (vIDM / WSA) opensearch service will not start.

book

Article ID: 315176

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

  • Opensearch service will not start.
    • Running remediate or other vIDM requests through LCM may fail on the health check.
    • In ssh session to the IDM nodes, we see that opensearch fails to start.

/etc/init.d/opensearch status

Not running

/etc/init.d/horizon-workspace status

RUNNING as PID=_____

Environment

  • VMware Identity Manager 3.3.7

Cause

  • This may be caused by stale liquibase lock.

Resolution

Resolution

  • First confirm that opensearch is Not Running but horizon-workspace is Running:
    • /etc/init.d/opensearch status
    • /etc/init.d/horizon-workspace status

 

  • Try to simply restart opensearch:
    • /etc/init.d/opensearch restart

^ If it spends minutes Waiting for IDM then you can kill it with Ctrl+C

 

  • A common cause of this issue is an inability to secure the lock. This can be caused by an unclean restart of opensearch for example.

Step 2 only needs to be executed once for the cluster. The remaining steps 

    1. Make sure Opensearch service is stopped on all nodes:
      /etc/init.d/opensearch stop

    2. Release locks (once for the cluster is enough - run on psql primary node)
      /usr/sbin/hznAdminTool liquibaseOperations -forceReleaseLocks

    3. Restart the main vIDM service - first on primary, wait a minute or two, then the other two nodes:
      service horizon-workspace restart

    4. Start opensearch on all nodes:
      /etc/init.d/opensearch start

 

Workaround: if forceReleaseLocks fails

  • If the hznAdminTool command above hangs and does not complete, there may be another lock which must be manually removed:
    1. First confirm cluster health as per KB 367175: if hznAdminTool gives error "The connection attempt failed", this can indicate that the delegateIP needs to be assigned to the psql primary node on eth0:0. 
    2. Make sure Opensearch service is stopped on all nodes:
      /etc/init.d/opensearch stop

    3. Log in to the DB on psql primary node with this command:
      sudo -u postgres psql -h localhost -U horizon saas

    4. Check for a lock here:
      select * from saas.DatabaseChangeLogLock;

    5. If there is a lock found above (t, with some date & IP address), remove it like so:
      update saas.DATABASECHANGELOGLOCK SET LOCKED=false, LOCKGRANTED=null, LOCKEDBY=null where ID=1;

    6. Log out of the database with \q and issue steps 2,3,4 above: release liquibase locks, restart horizon-workspace and then start opensearch.

 

(Note: for versions of vIDM earlier than 3.3.7, replace opensearch with elasticsearch wherever mentioned. These older versions are now EOL.)

Additional Information

  • Impact/Risks:

    Brief service restart. If the vIDM is serving users in terms of login, there may be a momentary disconnect.

  • Health Status:
    curl http://localhost:9200/_cluster/health?pretty=true 

    Green:
    everything is good, there are enough nodes in the cluster to ensure at least 2 full copies of the data spread across the cluster.

    Yellow:
    functioning, but there are not enough nodes in the cluster to ensure HA (eg, a single node cluster will always be in the yellow state because it can never have 2 copies of the data).
    *for single Node - Elasticsearch/Opensearch will be yellow for a single node by default as it doesn't have a cluster. for single node its expected and it should not be a problem, if facing no issue in functionality.

    Red: broken, unable to query existing data or store new data, typically due to not enough nodes in the cluster to function or out of disk space.