Slowness and stuck WP processes after network outage (OWP/RWP specifically)
search cancel

Slowness and stuck WP processes after network outage (OWP/RWP specifically)

book

Article ID: 276597

calendar_today

Updated On:

Products

CA Automic Workload Automation - Automation Engine CA Automic One Automation

Issue/Introduction

There is a general slowness that's seen in the system with very high OWP or RWP counts occurring.  Sometimes this leads to WPs hanging.

In the Administration perspective under Automation Engine Management -> Database a high OWP or RWP count is seen in comparison to the usual counts.

In the WP logs, the following errors are shown:

U00029108 UCUDB: SQL_ERROR    Database handles  DB-HENV: xxxxxxxx  DB-HDBC: xxxxxxxx
U00003591 UCUDB - DB error info: OPC: 'SQLExecDirect' Return code: 'ERROR'
0U00003592 UCUDB - Status: '42S02' Native error: '208' Msg: 'Invalid object name 'MQ'.'
U00003594 UCUDB Ret: '3590' opcode: 'INPK' SQL Stmnt: 'INSERT INTO MQ (MQCP_System, MQCP_CAddr, MQCP_CSRName, MQCP_CAcv, MQCP_BAddr, MQCP_BSRName, MQCP_BAcv, MQCP_FAddr, MQCP_LogAddr, MQCP_PhysAddr, MQCP_BTable, MQCP_SchedTime, MQCP_Status, MQCP_Priority, MQCP_DRole, MQCP_LAddr, MQCP_Len, MQCP_Msg) OUTPUT INSERTED.MQCP_PK VALUES (?, ?, ?, ?, NULL, NULL, ?, ?, NULL, NULL, ?, convert(datetime,convert(varchar(max),getutcdate(),20),20), ?, ?, NULL, NULL, ?, ?)'
U00003590 UCUDB - DB error: 'SQLExecDirect', 'ERROR   ', '42S02', 'Invalid object name 'MQ'.'

Or in German:

U00029108 UCUDB: SQL_FEHLER Database-Handles DB-HENV: xxxxxxxx  DB-HDBC: xxxxxxxx
U00003591 UCUDB - DB-Fehler-Info: OPC: 'SQLExecDirect' Rückgabewert: 'ERROR'
U00003592 UCUDB - Status: '42S02' NativeError: '208' Msg: 'Ungültiger Objektname "MQ".'
U00003594 UCUDB-Ret: '3590' Opcode: 'INPK' SQL-Anweisung: 'INSERT INTO MQ (MQCP_System, MQCP_CAddr, MQCP_CSRName, MQCP_CAcv, MQCP_BAddr, MQCP_BSRName, MQCP_BAcv, MQCP_FAddr, MQCP_LogAddr, MQCP_PhysAddr, MQCP_BTable, MQCP_SchedTime, MQCP_Status, MQCP_Priority, MQCP_DRole, MQCP_LAddr, MQCP_Len, MQCP_Msg) OUTPUT INSERTED.MQCP_PK VALUES (?, ?, ?, ?, ?, ?, ?, ?, NULL, NULL, ?, convert(datetime,convert(varchar(max),getutcdate(),20),20), ?, ?, NULL, NULL, ?, ?)'
U00003590 UCUDB - DB-Fehler: 'SQLExecDirect', 'ERROR ', '42S02', 'Ungültiger Objektname "MQ".'

Environment

Version: any 21.0 system

Cause

Product defect where a network disruption or other disruption to the system (sudden stopping of processes) would cause an OWP or RWP to stop and there was no check to be sure all processes with roles were running still.  This is done normally during a PWP switch, restart, or start.

Resolution

This has been fixed with the 21.0.8 HF4 Automation Engine component.  Please note that the initialdata must also be on 21.0.8 HF4 and AWI should be on 21.0.8, or ideally, 21.0.8 HF3.

There are two possible workarounds:

  • Perform a PWP switch, if a role is not processed anymore OR
  • Do not use DWPs then the issue will not happen

This can also be avoided in some cases during maintenance by bringing down a node gracefully rather than a sudden stop of processes or restart of a server.