Performance slow down due Time critical DB calls on MQ Tables

Products

CA Automic Workload Automation - Automation Engine

Issue/Introduction

Error Message :
U0003524 ===> Time critical DB call! OPC: 'SLCU' time: '1:088.705.710'
U0003525 ===> 'SELECT TOP 1 * FROM MQWP with (ROWLOCK, READPAST, UPDLOCK, INDEX(NK_MQWP_Prio)) WHERE MQWP_SchedTime
The Automation Engine (AE) becomes very slow and messages similar to those above can be found in the WP log files.

Investigation

The following data is needed to further analyse the issue:

WP Log files (by default located in the AutomationEngine/temp folder)

Information found in the WP log files:

Around the time AE started slowing down, there are time critical database calls similar to these:

U0003524 ===> Time critical DB call! OPC: 'SLCU' time: '1:088.705.710'
U0003525 ===> 'SELECT TOP 1 * FROM MQWP with (ROWLOCK, READPAST, UPDLOCK, INDEX(NK_MQWP_Prio)) WHERE MQWP_SchedTime<=? ORDER BY MQWP_Priority, MQWP_SchedTime, MQWP_PK'

This is the time the database needs to respond to Automic:

U0003524 ===> Time critical DB call! OPC: 'SLCU' time: '1:088.705.710'

In this case you can see that they are simple select statements getting information from the MQ tables:

SELECT TOP 1 * FROM MQWP with (ROWLOCK, READPAST, UPDLOCK, INDEX(NK_MQWP_Prio)) WHERE MQWP_SchedTime<=? ORDER BY MQWP_Priority, MQWP_SchedTime, MQWP_PK'

A simple select statement should never take over a second, in this case it takes '1:088.705.710' seconds for the database to respond.

To have a look at the MQ tables, we execute the following statement against the database: (only for sql database)

SELECT TableName, NumRows, reservedpages *8192/1024 as TotalSpace, pages * 8192/1024 as DataSpace, (usedpages-pages)*8192/1024 as IndexSpace, (reservedpages-usedpages)*8192/1024 as UnusedSpace
FROM (SELECT t.[name] as tablename,avg([rows]) as NumRows,sum(total_pages) as reservedpages,sum(used_pages) as usedpages,
sum(
CASE
When it.internal_type IN (202,204) Then 0
When a.type < 1 Then a.used_pages
When p.index_id < 2 Then a.data_pages
Else 0
END) as pages
FROM sys.allocation_units as a Join sys.partitions as p on p.partition_id = a.container_id
left join sys.internal_tables it on p.object_id = it.object_id
JOIN sys.tables as t on p.object_id=t.object_id WHERE t.name like 'MQ%'
GROUP BY t.[name]) as subselect

The above statement returns the following information about the MQ tables:

TableName      NumRows     TotalSpace       DataSpace       IndexSpace      UnusedSpace
______________________________________________________________________________________
MQCP001          13               1020888            388200              625688                7000
MQCP002          25               2231600            289544            1937600                4456
MQCP003          10               1626608            779336              845368                1904
MQCP004          48                 821224            676640            5142832                1752
MQCP005           0                           0                    0                      0                      0
MQDWP             3                  725112           551904                20632             152576
MQLS             12304                11376               6600                  4344                  432
MQMEM            1324            1559072           990464               528368              40240
MQOWP            0                    14784             13264                   1488                   32
MQPWP            0                    18704             16800                   1840                   64
MQQWP            0                           0                    0                        0                     0
MQRWP            0                    16368             12992                   3320                    56
MQSRV            125                       40                  16                      24                      0
MQWP             1054             1079416           727640               166392             185384

The problem is that the tables have grown to a very large size with a larger unused space.

As you can see the unused allocated space for the tables is very large, best seen in the MQDWP table:

TableName      NumRows TotalSpace       DataSpace IndexSpace UnusedSpace
___________________________________________________________________________________________________
MQDWP               3               725112            551904              20632              152576

AE runs a full table scan on these tables, if the unused allocated space is too large it will take a longer time for the database to respond.

Environment

OS Version: N/A

Cause

Cause type:
Other
Root Cause: Caused when the AE Database tables have grown very large in size with an even larger unused space.

Resolution

Contact your Database Administrator to perform a shrink on the tables with large unused space.

Fix Status: No Fix

Fix Version(s):
N/A

Additional Information

Workaround :
N/A