Symptoms:
Errors in logs:
com.tricipher.saas.persist.PersistenceRuntimeException: could not prepare statement
Caused by: java.sql.SQLException: An attempt by a client to checkout a Connection has timed out.
Caused by: com.mchange.v2.resourcepool.TimeoutException: A client timed out while waiting to acquire a resource from com.mchange.v2.resourcepool.BasicResourcePool@7152381b – timeout at awaitAvailable()
com.vmware.horizon.directory.ldap.dc.commons.LdapPingChecker - Communication Error connecting to dc domcontrol.example.com for domain example.com
VMware Identity Manager = Workspace ONE Access 3.3.7
Restarting the nodes made all the active threads to be killed, hence the system could came up successfully straight after.
To avoid such issues in future, you can update the DB configuration datastore.poolConfig.maxPoolSize to new value 200 in the following file: /usr/local/horizon/conf/runtime-config.properties
This is how it looks in context.
Leave the other settings unchanged and edit maxPoolSize to have new value 200:
#DB connection pooling
datastore.poolConfig.numHelperThreads=10
datastore.poolConfig.maxPoolSize=150 -> datastore.poolConfig.maxPoolSize=200
datastore.poolConfig.minPoolSize=30
Once the file is saved, proceed with the restart of all nodes in the cluster.
Keep in mind there will be a downtime of a few minutes while the nodes restart.