VMware Identity Manager (vIDM / WSA) opensearch service will not start.
search cancel

VMware Identity Manager (vIDM / WSA) opensearch service will not start.

book

Article ID: 315176

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

Possible symptoms:

  • Running remediate or other VMware Identity Manager (vIDM) requests through Aria Suite Lifecycle (LCM) may fail on the health check. When a power off or power on operation is triggered from Aria Suite Lifecycle, the workflow fails because OpenSearch does not restart. On retry, the workflow skips the OpenSearch start step and then fails with the following error:
    • Error Code: LCMVIDM73110 Unable to get the vIDM end point. Unable to get the vIDM end point on the host VIDM.example.com. Retry to wait for some more time to get the vIDM end point.
  • The LCMVIDM71063 error occurs within Aria Suite Lifecycle when deploying vIDM:
    • com.vmware.vrealize.lcm.vidm.common.exception.VidmCommandExecutionException: Failed to start the vIDM elasticsearch service Not running at com.vmware.vrealize.lcm.vidm.core.task.VidmStartElasticSearchServiceTask.execute(VidmStartElasticSearchServiceTask.java:121) at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:62) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source)
  • Trying to visit the vIDM GUI through the LB FQDN may give the error:
    • 502 Bad Gateway

      NSX LB

 

Diagnostic condition:

  • Opensearch has status Not Running:
    • /etc/init.d/opensearch status
  • Opensearch does not start with a simple restart:
    • /etc/init.d/opensearch start
    • /etc/init.d/opensearch restart

Environment

VMware Identity Manager 3.3.7

Cause

This may be caused by stale Liquibase lock.

Resolution

First, confirm that the opensearch service is Not Running but horizon-workspace is Running by executing the following commands:

/etc/init.d/opensearch status
/etc/init.d/horizon-workspace status

Try to restart the opensearch service:

/etc/init.d/opensearch restart

If the process spends several minutes "Waiting for IDM," you can terminate it using Ctrl+C.

A common cause of this issue is an inability to secure the lock, often caused by an unclean restart of opensearch. The lock release step only needs to be executed once for the cluster.

  1. Ensure the opensearch service is stopped on all nodes:
    /etc/init.d/opensearch stop
  2. Release locks (execute once for the cluster on the psql primary node):
    /usr/sbin/hznAdminTool liquibaseOperations -forceReleaseLocks
  3. Restart the main VMware Identity Manager (vIDM) service. Restart the service on the primary node first, wait one to two minutes, and then restart the service on the remaining nodes:
    /etc/init.d/horizon-workspace restart
  4. Start opensearch on all nodes:
    /etc/init.d/opensearch start

If opensearch is still not running, reboot the affected node:

reboot -f


Workaround: If forceReleaseLocks fails

If the hznAdminTool command hangs and does not complete, a manual lock removal may be required. First, confirm cluster health. If hznAdminTool returns the error The connection attempt failed, this can indicate that the delegateIP needs to be assigned to the psql primary node on eth0:0.

  1. Ensure the opensearch service is stopped on all nodes:
    /etc/init.d/opensearch stop
  2. Retrieve the database password:
    cat /usr/local/horizon/conf/db.pwd
  3. Log in to the database on the psql primary node:
    psql -h localhost -U horizon saas
  4. Check for a lock:
    select * from saas.DatabaseChangeLogLock;
  5. If a lock is found (indicated by t, with a date and IP address), remove it using the following command:
    update saas.DATABASECHANGELOGLOCK SET LOCKED=false, LOCKGRANTED=null, LOCKEDBY=null where ID=1;
  6. Log out of the database using \q and repeat steps 2, 3, and 4 above to release Liquibase locks, restart horizon-workspace, and start opensearch.
Note: For versions of vIDM earlier than 3.3.7, replace opensearch with elasticsearch wherever mentioned. These older versions are now End of Life (EOL).

Additional Information

Impact/Risks

There is a brief service restart associated with these steps. If VMware Identity Manager (vIDM) is currently serving users for login, there may be a momentary disconnect.

Health Status

To check the cluster health, run the following command:

curl http://localhost:9200/_cluster/health?pretty=true
  • Green: Everything is functioning correctly. There are enough nodes in the cluster to ensure at least two full copies of the data are spread across the cluster.
  • Yellow: The cluster is functioning, but there are not enough nodes to ensure High Availability (HA). For example, a single-node cluster remains in a yellow state by default because it cannot maintain two copies of the data. This is expected for single-node deployments and is not an issue if functionality is otherwise normal.
  • Red: The cluster is broken. It is unable to query existing data or store new data, typically due to an insufficient number of nodes or a lack of disk space.