In a two-server cluster, lost the connection to their main process. The sub processes are completed, but halt in the Runstate "Running" and don't change to "Completed".
Therefor the main process doesn't continue working and the node's Management Queue fills up then.
The process can't be reset. That results in a Server error-message. Restarting the CA services doesn't solve the issue.
Release : 4.3
Component : Process Automation
The problem is caused by the two Orchestrator nodes being out of sync. To get them back in sync with the master information in the database, perform the following steps:
1) Stop the CA Process Automation service (i.e. the Orchestrator) on both nodes of the cluster. 2) Empty the contents of the following subdirectories under CA\PAM\server\c2o by moving their contents to a temporary place outside the PAM structure: data logs scripts tmp temp work wrappers 3) Start the first node again, and confirm that you can log in to it directly (not through the load balancer) 4) Once the UI is back running on the first node, start the second node.
To minimise the chance of this happening again, the following steps are recommended:
1) increase the logging retention, in case we see any problems in the future 2) add timeout/retry/abort to the SOAP call logic, rather than the current situation where it sits forever waiting for a response that will never come 3) if these old instances do need to be run, re-request them as they cannot restart through the PAM UI's reset option.