AE: Troubleshooting OWP having constantly high Utilization at 100%
search cancel

AE: Troubleshooting OWP having constantly high Utilization at 100%

book

Article ID: 198905

calendar_today

Updated On:

Products

CA Automic Workload Automation - Automation Engine

Issue/Introduction

After some Oracle Database Deadlock issues which led to multiple DB sessions having to be killed, the OWP process seems to be always busy, displaying 100% utilization all the time (B01, B10 and B60).

A Cold-Start was performed while starting the Automation Engine a while ago while being on a version inferior to 12.3.4

Nothing appears in the OWP log that could explain why it's continuously busy.

There are just a few records in the MQOWP table (less than 10) and they seem to be processed quickly enough.

The load seems to derive from a specific client, on this case it's Client 150 according to the Chart/Table view in Processes and Utilization:

 

How to investigate in order to find out what is causing this high utilization of the OWP process?

Environment

Release : 12.x

Component : AUTOMATION ENGINE

Cause

This issue is caused by the ColdStart of the system that removed the JPEND message from the MQWP table so the Workflow could not be deactivated and the OWP would loop trying to process it continuously.

Resolution

Workaround:

In order to troubleshoot this kind of issues we need to do the following:

1. Enable the traces tcpip=2 and database=4 on the impacted WP process via AWI for two/three minutes, then set them again to 0.

2. Perform an analysis on the OWP associated trace log file WPsrv_trc_XXX_00.txt with a Text editor like RS File Viewer (rsview) that is available in the tools/no_supp folder of the AE image.

a. Look for the string RCV and count the number of lines and see which is the one that appears most the time.

On this case, out of 7567 RCV lines, 7215 were like the following one:

20200901/125229.256 - JPEXEC_R              RCV  DEACT    frm UC4D#WP003                                MQWP     MsgID: 1092364862 c-acv: 00000000

b. Do a search that will print the next fifth line as that will be the associated Runid that OWP tries to Deactivate.

Click on Edit - Select Lines and paste the exact string from above: "RCV  DEACT"  and Successor=5 (to also select the 5th line that appears after the DEACT message)

 

c. Then click on Display selection and you should find the duplicated Runids (EH_AH_Idnr) that keep appearing all the time, on this case 816263749 and 816310224:

3. To remove these runIDs and break the loop there are 2 options:

  1. You can use the function 'Change Status Manually' in Process Monitoring to set the status of these runIDs to "cancelled" (or any other status) as long as the function is 'Change Status Manually' is used.
  2. You can run the process monitoring delete cleanup SQL statements on the database to remove these runIDs from EH.

This is explained on the following article, for which you should contact Broadcom Technical Support to provide the necessary statements.
https://knowledge.broadcom.com/external/article?articleId=106299

Once one of these options is applied the OWP process utilization will start to consume what is left in the OWP queue and utilization will eventually drop to 0.

 

Solution:

Update to a fix version listed below or a newer version if available.

Fix version:
Component(s): Automation.Engine 

Automation.Engine 12.1.9 - Available
Automation.Engine 12.2.7 - Available
Automation.Engine 12.3.4 - Available



Additional Information

Not all cases of a looping OWP are solved by the upgrade.