vIDM 3.3.x Upgrade Stuck at "Waiting for SaaS to Come Up" or Certificate Replacement Fails Due to Liquibase Lock Error "LCMVIDM73101"

Products

VMware Aria Suite

Issue/Introduction

While performing VMware Identity Manager (vIDM) 3.3.x upgrade or certificate replacement using VMware Aria Lifecycle Manager, the process fails or gets stuck at "Waiting for SaaS to come up on the VMware Identity Manager Host."
Further investigation reveals an issue with the "ACS Health - Application Deployment Status" service in the vIDM diagnostic page, and logs point to a database lock preventing proper initialization of services.

The following error was found in the Aria Lifecycle Manager log (/var/log/vrlcm/vmware_vrlcm.log):

Error Code: LCMVIDM73101
Failed to retrieve vIDM health status after maximum retries. For more information, refer to the VMware Aria Suite Lifecycle log. Appliance health check(s) - [ACSHealth] failed on the host <hostname>.

The following error appears in the /opt/vmware/horizon/workspace/logs/accesscontrol-service.log file:

Exception encountered during context initialization - canceling refresh attempt:
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'liquibase' defined in class path resource [com/vmware/vidm/accesscontrol/db/DbDataStoreAutoConfiguration.class]:
Invocation of init method failed; nested exception is liquibase.exception.LockException: Could not acquire change log lock.

Additionally, one of the vIDM nodes shows a blank login page with a red error message box:

Environment

VMware Identity Manager 3.3.x

Cause

The cause of the failure is a database lock preventing proper initialization of vIDM services. Specifically, a Liquibase change log lock in the database blocks the startup of the vIDM SaaS services. The ACSHealth appliance health check fails due to this database lock.

Resolution

Take snapshots of all nodes in vCenter prior to apply the steps, please refer to documentation Take Snapshots of a Virtual Machine.

To resolve this issue:

Take a putty session to the vIDM node as root.
Release the DB lock on all nodes one by one by running this command:

/usr/sbin/hznAdminTool liquibaseOperations -forceReleaseLocks
Restart the horizon-workspace service on all nodes by running this command:

service horizon-workspace restart
Retry the upgrade from the vRSLCM console.

To resolve this issue in vIDM 3.3.7:

Note: There are two possible DB locks.

Find which node is the postgres primary first as it holds the delegate IP.
Run this on command on any node:
Note: Password is found in /usr/local/etc/pgpool.pwd in a single node deployment the Password is found in /usr/local/horizon/conf/db.pwd

su root -c "echo -e 'password'|/opt/vmware/vpostgres/current/bin/psql -h localhost -p 9999 -U pgpool postgres -c \"show pool_nodes\""
If login to the database fails with authentication error please restart the cluster as per kb Graceful Shutdown and Power On of a VMware Identity Manager PostgreSQL cluster

If found in /opt/vmware/horizon/workspace/logs/accesscontrol-service.log

Stop the horizon workspace on all nodes by running this command:

/etc/init.d/horizon-workspace stop
On the primary node, run:

psql -U horizon saas

SELECT * FROM saas.ACS_DATABASECHANGELOGLOCK;

UPDATE saas.ACS_DATABASECHANGELOGLOCK set LOCKED=false, LOCKGRANTED=null, LOCKEDBY=null where ID=1;
Start the horizon-workspace service on all nodes with the postgres primary node first:

/etc/init.d/horizon-workspace start

If found in /opt/vmware/horizon/workspace/logs/horizon.log

Stop the horizon workspace on all nodes by running this command:

/etc/init.d/horizon-workspace stop
On each node, run:

psql -U horizon saas

SELECT * FROM saas.DATABASECHANGELOGLOCK;

UPDATE saas.DATABASECHANGELOGLOCK set LOCKED=false, LOCKGRANTED=null, LOCKEDBY=null where ID=1;
Start the horizon-workspace service on all nodes, with the postgres primary node first:

/etc/init.d/horizon-workspace start

Additional Information

If the DB release command fails with 'ERROR: cannot execute UPDATE in a read-only transaction', this may occur if the delegateIP is not correctly assigned.

Refer to step 3 in KB Troubleshooting VMware Identity Manager postgres cluster deployed through vRealize Suite Lifecycle Manager to identify the postgres primary node.Make sure that the delegateIP is assigned to the postgres primary node and execute the release command again.