Clarity Processes that use GEL are getting stuck in sleep
search cancel

Clarity Processes that use GEL are getting stuck in sleep

book

Article ID: 106346

calendar_today

Updated On:

Products

Clarity PPM On Premise Clarity PPM SaaS

Issue/Introduction

Some processes that use GEL are not completing in Clarity. If you create a process with just a start and finish step, and nothing else, the process completes successfully, but those using GEL are not completing.

Short Term Workaround: Restart the bg services. Usually one restart suffices. But sometimes multiple restarts will be required.

The problem with restarting the bg process one or more times to resolve this problem is that the problem is likely to reoccur

Environment

Any Clarity environment that uses processes with gel scripts

Cause

  • This can happen if the util:sleep command is in the gel scripts taking up the threads (this can be confirmed with thread stacks)
  • Use util:sleep with caution in GEL scripts
  • This tag puts the thread executing the GEL script to sleep.
  • There are 15 GEL threads allocated per Process Engine instance.
  • If all 15 are sleeping, other GEL steps will be unable to execute and processes will appear to hang. 
  • Having most or even some of your gel threads sleeping can cause slow performance on your process engines

Resolution

  • In the situation a bottleneck problem is faced and confirmed to be caused by the sleep tags as per the thread dumps & stacks, we at Broadcom Support will recommend redesigning the culprit process and/or pausing it until this can be done
    • Rewrite the process gel scripts causing the issue, so that they do not cause a bottleneck (may require removing of util:sleep - see the suggestions below)

Possible ways to rewrite your processes: 

  • Instead of using util:sleep in some cases it is possible to use the check to a post condition.  Assuming your GEL script is trying to monitor the step completion in some other process instance you could have that process set a flag/value on an object that can trigger an event that the post condition in your monitoring process can detect before it moves on
    • Because the post condition pipeline can handle and iterate through many post conditions without clogging up its bandwidth, fixing your processes so that they use post conditions instead of util:sleep will allow your to sure that this problem won't occur again.
    • You could have 300+ process instances waiting on a post condition without affecting process engine performance, whereas even a few GEL scripts stuck sleeping or polling can have an adverse affect on the system throughput and behavior. 
    • This is recommended in cases in which the sleep tags is used for longer times (i.e. minutes / hours)
  • You could break your existing process and GEL script into multiple pieces rather than one bigger process.
  • Your process logic needs to be rethought to use a custom object or custom attribute on some other object that can used in your process in a post condition. 

Best Practices on util:sleep:

  • util:sleep is allowed for use in Clarity, with caution to avoid taking up all the threads for GEL
  • It is best to set it for a short amount of time, i.e. under 1 sec
  • Possible uses: to avoid throttling, creating a delay of few hundred ms to communicate with an API resource, or create a time buffer for a file operation
  • Some partner integration processes, such as Rego Data Extractor are recognized to be using Broadcom best practices and do not require any updates or review
  • You can use util:sleep for your custom processes for short delays (examples below), with caution and Broadcom Support recommends testing on a lower environment with sufficient process load before moving the processes to Production
  • Examples of scenarios util:sleep may be used for:

    • An outbound interface is called and repeating the operation too quickly on it creates throttling, disconnects, or account locks
    • An API resource needs a few hundred milliseconds to recover before being called again
    • In the example of Rego Data Extractor, the file system reports a file copy or file delete is done when it really isn't

Additional Information

For root cause analysis of processes getting stuck, refer to : Collecting thread dumps in Clarity (On Premise Only)