Processes hung in distributed environment
search cancel

Processes hung in distributed environment


Article ID: 189106


Updated On:


CA Process Automation Base


In a two-server cluster, lost the connection to their main process. The sub processes are completed, but halt in the Runstate "Running" and don't change to "Completed".

Therefore the main process doesn't continue working and the node's Management Queue fills up then.

The process can't be reset. That results in a Server error-message. Restarting the services does not solve the issue.


Release : 4.3

Component : Process Automation


The problem is caused by the two Orchestrator nodes being out of sync. To get them back in sync with the master information in the database, perform the following steps:

1) Stop the CA Process Automation service (i.e. the Orchestrator) on both nodes of the cluster.
2) Empty the contents of the following subdirectories under CA\PAM\server\c2o by moving their contents to a temporary place outside the PAM structure:
3) Start the first node again, and confirm that you can log in to it directly (not through the load balancer)
4) Once the UI is back running on the first node, start the second node.

To minimize the chance of this happening again, the following steps are recommended:

1) increase the logging retention, in case we see any problems in the future
2) add timeout/retry/abort to the SOAP call logic, rather than the current situation where it sits forever waiting for a response that will never come
3) if these old instances do need to be run, re-request them as they cannot restart through the PAM UI's reset option.