vIDM postgres cluster health status is critical. Postgres status of node(s) some node, marked as down in the cluster
search cancel

vIDM postgres cluster health status is critical. Postgres status of node(s) some node, marked as down in the cluster

book

Article ID: 395382

calendar_today

Updated On: 04-24-2025

Products

VMware Aria Suite

Issue/Introduction

vIDM postgres cluster health status is critical. Postgres status of node(s) some node, marked as down in the cluster creating service outages.

Log excerpts: 

workspace.log:
10-Feb-2025 23:13:23.143 WARNING [localhost-startStop-8] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [SAAS] appears to have started a thread named [Replication Thread] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
com.vmware.horizon.datastore.IdmRMIAsynchronousCacheReplicator.replicationThreadMain(IdmRMIAsynchronousCacheReplicator.java:89)
com.vmware.horizon.datastore.IdmRMIAsynchronousCacheReplicator$ReplicationThread.run(IdmRMIAsynchronousCacheReplicator.java:369)

horizon.log
10-Feb-2025 23:13:23.146 WARNING [localhost-startStop-8] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [SAAS] appears to have started a thread named [Replication Thread] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
com.vmware.horizon.datastore.IdmRMIAsynchronousCacheReplicator.replicationThreadMain(IdmRMIAsynchronousCacheReplicator.java:89)
com.vmware.horizon.datastore.IdmRMIAsynchronousCacheReplicator$ReplicationThread.run(IdmRMIAsynchronousCacheReplicator.java:369)

accesscontrol.service.log
2025-02-10 17:37:11,396 GMT WARN VIDM-FQDN:accesscontrol (localhost-startStop-1) [;;;] org.springframework.context.annotation.AnnotationConfigApplicationContext - Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'liquibase' defined in class path resource [com/vmware/vidm/accesscontrol/db/DbDataStoreAutoConfiguration.class]: Invocation of init method failed; nested exception is liquibase.exception.DatabaseException: org.postgresql.util.PSQLException: The connection attempt failed.

Environment

VMware Identity Manager 3.3.x

Resolution

Please follow the below steps and ensure that you take snapshots of vRSLCM and vIDM nodes prior to applying below steps

NOTE : take snapshots of vRSLCM and vIDM nodes prior to applying below steps

  1. Apply the KB steps and increase the replication threshold to 100000, this values is based the replications delays recorded in the logs.
    1. https://knowledge.broadcom.com/external/article?articleNumber=322680
  2. From Lifecycle manager, Open Swagger API and Update the below property under 'Property interface Controller' in Private API
         API : /lcm/automata/api/engine/configproperty/{key} updatePropertyValue
         key : lcm.postgres.replication.delay.threshold.bytes
         Value : 100000

After the above steps are followed observe for 3-4 days and if still observing the issue execute the below step

  1. Run the below commands on all the vIDM nodes
        a. sed -i 's/attemptPCPRecoveryOfNodesWithReplicationDelay$/#attemptPCPRecoveryOfNodesWithReplicationDelay/g' /usr/local/etc/auto-recovery.sh
        b. systemctl restart NetworkService

Additional Information

NOTE : take snapshots of vRSLCM and vIDM nodes prior to applying below steps