JCP Rest OutOfMemoryError with Arrays.copyOf causes login impossible due to ErtEstimationResource
search cancel

JCP Rest OutOfMemoryError with Arrays.copyOf causes login impossible due to ErtEstimationResource

book

Article ID: 237242

calendar_today

Updated On:

Products

CA Automic Workload Automation - Automation Engine

Issue/Introduction

Under some circumstances, while displaying a Workflow Monitor the JCP Rest process hangs and eventually dies as it reaches out the maximum memory limit. 

This could also be seen while doing a Search in AWI.

Symptoms on versions prior to 12.3.9/21.0.3:

In the JCP log, we observe queries with for ERT for the runid 0 as below:

20220628/165520.170 - 68     U00045098 Method 'GET', URL: 'http://AEHOSTNAME:8088/ae/api/v1/XXXX/executions/0/ert', received from IP: 'X.X.X.X'
20220628/165520.438 - 68     U00045105 Log on of 'USERNAME/DEPARTMENT' successful.
20220628/165525.100 - 68     U00003524 UCUDB: ===> Time critical DB call!       OPC: 'SLCT' time: '4643ms'
20220628/165525.100 - 68     U00003525 UCUDB: ===> 'select AH.AH_Idnr, null as AH_ERTEnd, AJPP.AJPP_Object, AJPP.AJPP_OType, AJPP.AJPP_Lnr, AJPPA.AJPPA_PreLnr, AH.AH_Ert, AH.AH_Runtime, AH.AH_TimeStamp2, AH.AH_TimeStamp4, AJPP.AJPP_JobStatus, AJPP.AJPP_AH_Idnr, 0 as AH_LoopIteration, AJPPF.AJPPF_LoopCount, AJPPF.AJPPF_LoopIterator, AJPP.AJPP_SubType, AH.AH_TimeStamp1, AJPP.AJPP_ErlstStTime from ah left join ajpp on ah.ah_parentprc = ajpp.ajpp_ah_idnr and ah.ah_ajpp_lnr = ajpp.ajpp_lnr left join ajppa on ajpp.ajpp_ah_idnr = ajppa.ajppa_ah_idnr and ajpp.ajpp_lnr = ajppa.ajppa_ajpp_lnr left join ajppf on ah.ah_parentprc = ajppf.ajppf_ah_idnr where ah.ah_parentprc = ? and ah.ah_client = ? union all select AH.AH_Idnr, null as AH_ERTEnd, AH.AH_Name, AH.AH_OType, AH.AH_AJPP_Lnr, AJPPA.AJPPA_PreLnr, AH.AH_Ert, AH.AH_Runtime, AH.AH_TimeStamp2, AH.AH_TimeStamp4, AH.AH_Status, 0 as AH_ParentHir, 0 as AH_LoopIteration, AJPPF.AJPPF_LoopCount, AJPPF.AJPPF_LoopIterator, AH.AH_SubType, AH.AH_TimeStamp1, null as AJPP_ErlstStTime from ah left join ajpp on ah.ah_parentprc = ajpp.ajpp_ah_idnr and ah.ah_ajpp_lnr = ajpp.ajpp_lnr left join ajppa on ajpp.ajpp_ah_idnr = ajppa.ajppa_ah_idnr and ajpp.ajpp_lnr = ajppa.ajppa_ajpp_lnr left join ajppf on ah.ah_parentprc = ajppf.ajppf_ah_idnr where ah.ah_idnr = ? and ah.ah_client = ? '
20220628/172725.283 - 68     U00003434 Server routine  'ErtEstimationResource$$Lambda$293/0x000000080071d440/rest-transaction' required '32' minutes and '4' seconds for processing.
20220628/172725.286 - 68     U00045014 Exception 'java.lang.NullPointerException: "null"' at 'com.automic.persistence.impl.AdaptiveErtComponent.processWorkflowItems():229'.
20220628/172725.290 - 68     U00045099 The server replied with following status: '500'

And in the JCP traces, we sometimes find:


20220503/095521.811 - 47               ----------------------- Stack Trace -----------------------
20220503/095521.812 - 47               java.lang.OutOfMemoryError: Java heap space

 

Symptoms on version 12.3.9/21.0.3:

The JCP Rest process hangs and eventually dies as it reaches out the maximum memory limit when displaying a workflow monitor.
The errors in the JCP Rest log are:

20220708/091201.406 - 36   U00003434 Server routine 'ErtEstimationResource$$Lambda$301/204072346/rest-transaction' required '0' minutes and '37' seconds for processing.
20220708/091201.412 - 36   U00045014 Exception 'java.lang.OutOfMemoryError: "Java heap space"' at 'java.util.Arrays.copyOf():3181'.

And in the JCP force traces:

20220708/091123.229 - 37        java.lang.OutOfMemoryError: Java heap space
20220708/091123.229 - 37        at java.util.Arrays.copyOf(Arrays.java:3181)
20220708/091123.229 - 37        at java.util.ArrayList.grow(ArrayList.java:267)
20220708/091123.229 - 37        at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:241)
20220708/091123.229 - 37        at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:233)

Environment

Release : 12.3.x and 21.0.x

Component : AUTOMATION ENGINE

Cause

Several defects: pre 12.3.9/21.0.3 there was a defect on the AWI (sending a wrong parameter for the ERT query with runid 0) and also on the backend (JCP/JWP) processing this query.

On 12.3.9 and 21.0.3+ there was a situation (EH_ParentHir=NULL on the top entry) in the Workflow that leads  to a loop in processWorkflowItems when special nodes with run-id NULL that caused JCP to crash because it uses all memory available.

Resolution

Workaround:

To prevent the JCP from crashing increase the memory of the JCP (REST) to 2 GB during startup (-Xmx2048m) or higher (-Xmx4096m) if still occurring.

Additionally, increase in ucsrv.ini the parameter parallelDbConnections to a higher value (20 or 30 instead of 5) as below and restart the JCPs.

For 12.3:

[REST]
parallelDbConnections=30

For 21.x:

[JDBC]
parallelDbConnections=30

If the problem still occurs, as a temporary workaround until the AE can be upgraded and if the problem is encountered too often, AWI could be upgraded to 12.3.9 while keeping the AE in current 12.3.8 version.

Solution:

Update to a fix version listed below when they are available or a newer version if available.

Fix version:

Component(s): Automation Engine and Automic Web Interface


Fix Version/s:

Automation.Engine 12.3.9 HF1 - Available
Automation.Engine 21.0.3 HF3 - Available
Automation.Engine 21.0.4 - Available