Running more than two process engines in a clarity cluster

Products

Clarity PPM On Premise Clarity PPM SaaS

Issue/Introduction

Why shouldn't one run more than two process engines for a Clarity environment's BG services / cluster when there are lots of processes running at the same time?

Is there any Whitepaper on why this isn't recommended and the impact of having multiple process engines?

Environment

Release: All
Component: Clarity Processes

Resolution

This KB explains the potential pitfalls of having too many Process Engines. This is definitely a case of more is not better. Here's why:

First, let's explore the process engine architecture to lay a foundation that will help you understand why running no more than two process engines is so important. The process engine has different pools of processing threads for different functions.

- Pre condition pipelines - The default is to enable 2 threads. There is a maximum of 5 threads to handle the processing of Pre Condition expressions.
- Action pipelines - The default is to enable 3 threads. There is a maximum of 5 threads to handle Process Actions (set attributes, lock fields, convert objects, send action items, start GEL scripts).
- •Post condition pipelines - The default is to enable 3 threads. There is a maximum of 5 threads to handle processing of Post Condition expressions.

2. In addition, there are 2 other thread pools not seen in the Process Engine statistics:

- Custom Script Action threads - can handle up to 15 threads executing GEL actions, dispatched by the Action Pipeline.
- Condition Evaluation query threads - can handle up to 15 threads executing queries related to pre/post condition expressions.

3. Next, here is some additional information about how the process engines choose which processes to run:

The process engine's load balancing capabilities are essentially a foot race. When an update event is generated (i.e. a user updates a process-enabled object instance like a project), a row is inserted into NMS_MESSAGES table in the database that contains details about the event. After that a multicast message is sent indicating to the process engines that an event occurred.

Each process engine has a thread called the NMS Message Receiver that will receive this multicast message and act on it. The very first thing it does is to execute a SELECT against NMS_MESSAGES to retrieve any undelivered messages. It "delivers" them to itself by inserting a new row into NMS_MESSAGE_DELIVERY that is keyed for the process engine it's being delivered for.

Once the messages are delivered, each process engine then iterates over the new messages it received, determining if any action should be taken. In each case, it must evaluate the event. So in a cluster with N process engines, you will have all N process engines evaluating each new message simultaneously. For INSERT or UPDATE events, the process engine must iterate through the list of eligible Process Definitions for that particular type of event, evaluate their Start Condition expression (which also performs a rights check via an SQL query). This evaluation is dispatched into the Condition Evaluation thread pool, which can grow up to 15 threads large. If the condition evaluates to "true", the process engine attempts to start the process by inserting a row into BPM_EM_EVENT_PROCESS_LOCKS table. Whichever Process Engine succeeds in inserting the lock is the one that successfully starts the process.

NOTE: The more process engines present, the greater the database contention is on the NMS_MESSAGES, NMS_MESSAGE_DELIVERY, BPM_EM_EVENT_PROCESS_LOCKS tables. This affects the overall database as well because all of the process engines are attempting to evaluate the start conditions on the process definitions tied to the object type receiving the event.

Now we have enough information to explain why we tell customers to run a maximum of two Process Engines. Two process engines provide redundancy in case one of your process engines goes down as well as majorly reducing contention.

Now let's look at the math that shows why you should not run more than two process engines by looking at an example scenario.

EXAMPLE SCENARIO:

For this scenario, suppose we have an interface defined into Clarity that imports Requisitions. Each hour, a batch of requisitions is xogged into Clarity. Suppose each batch contains 1000 requisition instances for update.

In the Clarity design, there are 25 active process definitions defined on the Requisition object for "update" for various start conditions. Out of the 25 definitions, based on the start conditions only 1 will ever evaluate to true at any time causing just a single process to run.

# of object instances = 1000
# of process defs = 25
# of process engines = 8

When the XOG run happens, 1000 requisition instances are xogged in for update. The requisition xog is very quick because the object is relatively light weight. Each XOG instance update triggers a corresponding BPM "update" event to be fired which amounts to a single row in NMS_MESSAGES inserted per instance updated and a multicast event message being sent by the app instance that handled the XOG request.

This results in 1000 rows inserted into NMS_MESSAGES and 1000 multicast messages that are received by the 8 process engines simultaneously. The NMS Message Receiver thread on each process engine receives these very lightweight UDP multicast messages very quickly and begins retrieving rows from NMS_MESSAGES to deliver to itself by inserting rows in NMS_MESSAGE_DELIVERY. It also begins dispatching these event messages pulled from the database to the Condition Evaluation thread pool. On each process engine, the queue for the Condition Evaluation thread pool quickly grows to handle the backlog of 1000 new messages, resulting in all 15 possible Condition Evaluation threads being active on each process engine.

Let's look at the numbers now:

8 process engines * 15 Condition Evaluation threads = 120 concurrent threads evaluating the 1000 new messages

However, the load climbs to 120 concurrent evaluation threads and stays there for some time because it's not just that there are 1000 new messages to process. There are 25 process definitions to evaluate for each of the 1000 messages:

8 process engines * 25 process start conditions * 1000 messages = 200,000 total queries executed.

Assuming the database server has 24 CPUS and that the condition expression queries hypothetically take 200ms each (CPU bound on the database), we're now looking at 666 minutes of total execution time. 666 minutes / 24 CPUs = 27 minutes of solid CPU activity on the db server to handle the incoming load of messages. The load average on the database server can potentially rise to over 120 for a long period of time. Other aspects of Clarity will suffer from CPU starvation on the database server and all activities will be affected and slowed down.

As you can see, this is not a good situation.

Let's look at the same scenario with only 2 process engines:

2 PEs * 15 eval threads = 30 concurrent threads

2 PEs * 25 start conds * 1000 messages = 50,000 queries

166 minutes cpu / 24 cpus = 6.9 minutes total CPU time (db server load average could at a load average of 30 or higher during this time due to the 30 concurrent eval threads)

6.9 minutes of CPU at a load of 30 is still bad. Here is what you can do to reduce this load:

It turns out the 2 biggest variables at play are:

Limit the number of eligible process definitions - This is the total number of active/validated process definitions for the object being inserted/updated. Fewer definitions means that fewer start conditions must be evaluated and thus a lower impact on performance is. For "interface" type users, limit their "Process Start" rights to the absolute minimum necessary to accomplish what is needed. Have an interface use a specifically designed XOG user that has a minimal set of process start rights. Do not give this user the global right "Process Start All".
Limit the number of active process engines - As seen above, when more process engines are active, more duplicate work will occur as each engine attempts to handle the incoming event workload. Reduce the total number of engines if possible to two engines and make sure they are on separate physical hosts for redundancy. Monitor the pipelines for activity. If the pipelines typically stay at zero queued processes, then more process engines are not needed.