All of our workflows are in running or waiting state. Even test processes with a simple operator (ex: run script to run hostname) sits there spinning at the one/only operator.
Release : 4.3.04+
ITPAM was experiencing a performance problem related to the number of ActiveMQ_Msgs that built up. To help confirm this, we used the following two queries (because it is a clustered domain orchestrator environment):
select count(*), CONTAINER
FROM [PAM_RT].[dbo].Node0ACTIVE_MSGS] group by CONTAINER
select count(*), CONTAINER
FROM [PAM_RT].[dbo].Node1ACTIVE_MSGS] group by CONTAINER
Note:
The count for DLQ can be ignored. The requestqueue/responsequeue counts are the important containers to keep an eye out for. Request/Response queue counts greater than 100 are eligible for the noOfConsumers property key mentioned below.
Reasons why the ActiveMq Msgs table may show counts that continually increase include:
The second scenario can be identified by messages in the Orchestrator's c2o.log. Example:
INFO [com.optinuity.c2o.transport.Resolver] [_autoTPRecovery] Transport properties [TransportID=agentNode, Hostname=<hostname>, IPAddress=<ip_address>, Port=7003, IsSecure=false] of node a999b6e3-8862-4d0e-811c-e91876d7e501 not reachable
Depending on the cause of the bottleneck, make the necessary adjustments. Based on the reasons described in the "Cause" section, the solutions are as follows:
Scheduled jobs executing before previous scheduled jobs complete:
If you know that your scheduled jobs take longer to complete than the frequency in which they are started, increase the frequency in which they are scheduled to run so that new processes start after previously scheduled processes complete.
Agents that not able to communicate properly with the Orchestrators:
Address the communication problem between the agent and the orchestrator. Example of problems/solutions: