VMware Identity Manager (vIDM / WSA) service opensearch / elasticsearch will not start.
search cancel

VMware Identity Manager (vIDM / WSA) service opensearch / elasticsearch will not start.

book

Article ID: 315176

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

  • Opensearch service (before 3.3.7 known as elasticsearch) will not start.
    Running remediate or other vIDM requests through LCM may fail on the health check. In ssh session to the IDM nodes, we see that opensearch can't start

 

/etc/init.d/opensearch status

Not running

 

/etc/init.d/horizon-workspace status

RUNNING as PID=_____

 

Environment

  • VMware Identity Manager 3.3.x

Cause

  • This may be caused by stale liquibase lock.

Resolution

Resolution

  • First confirm that opensearch is Not Running but horizon-workspace is Running:
    • /etc/init.d/opensearch status
    • /etc/init.d/horizon-workspace status

 

  • Try to simply restart opensearch:
    • /etc/init.d/opensearch restart

^ If it spends minutes Waiting for IDM then you can kill it with Ctrl+C

 

  • A common cause of this issue is an inability to secure the lock. This can be caused by an unclean restart of opensearch for example.

Step 2 only needs to be executed once for the cluster. The remaining steps 

    1. Make sure Opensearch service is stopped on all nodes:
      /etc/init.d/opensearch stop

    2. Release locks (once for the cluster is enough - run on psql primary node)
      /usr/sbin/hznAdminTool liquibaseOperations -forceReleaseLocks

    3. Restart the main vIDM service - first on primary, wait a minute or two, then the other two nodes:
      service horizon-workspace restart

    4. Start opensearch on all nodes:
      /etc/init.d/opensearch start

 

Workaround: if forceReleaseLocks fails

  • If the hznAdminTool command above hangs and does not complete, there may be another lock which must be manually removed:
    1. First confirm cluster health as per KB 367175: if hznAdminTool gives error "The connection attempt failed", this can indicate that the delegateIP needs to be assigned to the psql primary node on eth0:0. 
    2. Make sure Opensearch service is stopped on all nodes:
      /etc/init.d/opensearch stop

    3. Log in to the DB on psql primary node with this command:
      sudo -u postgres psql -h localhost -U horizon saas

    4. Check for a lock here:
      select * from saas.DatabaseChangeLogLock;

    5. If there is a lock found above (t, with some date & IP address), remove it like so:
      update saas.DATABASECHANGELOGLOCK SET LOCKED=false, LOCKGRANTED=null, LOCKEDBY=null where ID=1;

    6. Log out of the database with \q and issue steps 2,3,4 above: release liquibase locks, restart horizon-workspace and then start opensearch.

 

(Note: for versions of vIDM earlier than 3.3.7, replace opensearch with elasticsearch wherever mentioned. These older versions are now EOL.)

Additional Information

  • Impact/Risks:

    Brief service restart. If the vIDM is serving users in terms of login, there may be a momentary disconnect.