search cancel

Processes hung in distributed environment

book

Article ID: 189106

calendar_today

Updated On:

Products

CA Process Automation Base

Issue/Introduction

In a two-server cluster, lost the connection to their main process. The sub processes are completed, but halt in the Runstate "Running" and don't change to "Completed".

Therefor the main process doesn't continue working and the node's Management Queue fills up then.

The process can't be reset. That results in a Server error-message. Restarting the CA services doesn't solve the issue.

 

Environment

Release : 4.3

Component : Process Automation

Resolution

The problem is caused by the two Orchestrator nodes being out of sync. To get them back in sync with the master information in the database, perform the following steps:

1) Stop the CA Process Automation service (i.e. the Orchestrator) on both nodes of the cluster.
2) Empty the contents of the following subdirectories under CA\PAM\server\c2o by moving their contents to a temporary place outside the PAM structure:
data
logs
scripts
tmp
temp
work
wrappers
3) Start the first node again, and confirm that you can log in to it directly (not through the load balancer)
4) Once the UI is back running on the first node, start the second node.


To minimise the chance of this happening again, the following steps are recommended:

1) increase the logging retention, in case we see any problems in the future
2) add timeout/retry/abort to the SOAP call logic, rather than the current situation where it sits forever waiting for a response that will never come
3) if these old instances do need to be run, re-request them as they cannot restart through the PAM UI's reset option.