The DB Reorg Utility (ucybdbre) is slow when removing records for AH due to high number of records in EH

Products

CA Automic Workload Automation - Automation Engine

Issue/Introduction

Introduction

Although the database performance is good, the DB Reorg Utility is flagging records very slowly in the AH table.

The Reorg Utility is removing records in chunks of 200.000 records (by default), this depends on the parameter alloc_Size in ucybdbre.ini.

Even if Alloc_Size=10000 is set each step remains unexpectedly slow.

The SELECT statement may take only several minutes when looking in the database and depending on the size of the AH table:

U00003525 UCUDB: ===> 'SELECT AH_Client, AH_OType, AH_SType, AH_Status, AH_Idnr, AH_OH_Idnr, AH_TimeStamp4, AH_ArchiveFlag FROM AH WHERE AH_Client = ?  and AH_TimeStamp4 IS NOT NULL and AH_TimeStamp4 < ? and AH_DeleteFlag = 0  Order by AH_OH_Idnr, AH_TimeStamp4 DESC'

However, the DB Reorg Utility is not proceeding to the next chunk and seems to be hung.

Ultimately, the log will show a Time Critical DB call of which the time is actually much shorter than the time it took to proceed the the next chunk.

The Time Critical DB call is not an issue in itself: after all a large number of records is select to be flagged for deletion.

However, after the SELECT statement it takes tens of minutes or even hours to process the information (depending on alloc_Size).

Investigation

Run the DB Reorg Utility with trace setting db=2 in section [TRACE] in ucybdbre.ini. The processing can be stopped after 10 minutes to start the investigation.

Open the trace file in a text editor that indicates line numbers and find the following lines:

U00032223 Collecting all running tasks for current client.
Select EH_AH_Idnr from EH where EH_Client = ? order by EH_AH_Idnr ASC
U00003524 UCUDB: ===> Time critical DB call!       OPC: 'SLUC' time: 'x:xxx.xxx.xxx'
U00003525 UCUDB: ===> 'Select EH_AH_Idnr from EH where EH_Client = ? order by EH_AH_Idnr ASC'

Now, find the line indicated below, immediately after the Time Critical DB call and count the number of lines that start with READ between the SLUC and CLST/CMIT statements:

UCUDB32 SLUC RET 0000 HSTMT: 0000000127d30590 VALUE: 0000000000000032 ALL:  8.74515 DB:  8.74509 ODBC:  0.00001 UDB:  0.00005
UCUDB32 READ RET 0000 HSTMT: 0000000127d30590 VALUE: 0000000000000032 ALL:  0.00003 DB:  0.00003 ODBC:  0.00000 UDB:  0.00001

<...> hundreds of thousands or even millions of lines starting with UCUDB32 READ RET 

UCUDB32 READ RET 0001 HSTMT: 0000000127d30590 VALUE: 0000000000000000 ALL:  0.00000 DB:  0.00000 ODBC:  0.00000 UDB:  0.00000
UCUDB32 CLST RET 0000 HSTMT: 0000000127d30590 VALUE: 0000000000000000 ALL:  0.00004 DB:  0.00000 ODBC:  0.00003 UDB:  0.00001
UCUDB32 CMIT RET 0000 HSTMT: 0000000000000000 VALUE: 0000000000000000 ALL:  0.00016 DB:  0.00014 ODBC:  0.00000 UDB:  0.00002

The found number of lines represents the number of entries in the EH table of the client that is being reorganized.

Environment

Release : 12.x

Component : AUTOMATION ENGINE

Cause

When entries to be flagged for deletion are read from the AH table, their RunIDs have to be compared one-by-one with all RunIDs in the EH table (for that client) because records in AH cannot be deleted if they are still active in EH.

This means that if the EH table contains lots of entries (500.000+) it will slow down the processing by the Reorg Utility.

In conclusion, the size of the EH table impacts the performance of the DB Reorg Utility when deleting records in AH due to the necessary comparing of RunIDs.

Resolution

Clean up the EH table. Make sure that there are no entries in the EH table that don't need to be there. The EH table should be kept as small as possible.

Use the following query to analyze the content of the EH table:

select count(*), eh_client, eh_name, eh_status from eh group by eh_client, eh_name, eh_status order by 1 desc;

All records with status 1900 and 1930 should be removed. All other record with statuses above 1900 should probably also be removed but may need more consideration.

This investigation will likely reveal that some job or workflow is not deactivated automatically.

If jobs/workflows are not deactivated automatically, the Task should be modified such that deactivation is automatic.