Unix Agent floods MQ Tables when no more space is left on the file system where the logs are stored and tries a change logging

search cancel

Unix Agent floods MQ Tables when no more space is left on the file system where the logs are stored and tries a change logging

book

Article ID: 202385

calendar_today

Updated On: 10-11-2023

Products

CA Automic Workload Automation - Automation Engine CA Automic One Automation

Issue/Introduction

When the agent's logging folder has no free space left (file system full), It begins to flood MQ Process tables.

The Agent begins to flood one or the other MQ Table with repetitive messages.

Sometimes the message in the MQ table contains the name of the agent and:
CHGLOGR 01

This can be checked in table MQ*PWP by reading the MQPWP_MSG with a query similar to the following (MQ2PWP or MQ1PWP depending on your system):

SELECT MQPWP_PK,
MQPWP_System,
MQPWP_CAddr,
MQPWP_CSRName,
MQPWP_CAcv,
MQPWP_BAddr,
MQPWP_BSRName,
MQPWP_BAcv,
MQPWP_FAddr,
MQPWP_LogAddr,
MQPWP_PhysAddr,
MQPWP_BTable,
MQPWP_SchedTime,
MQPWP_Status,
MQPWP_Priority,
MQPWP_DRole,
MQPWP_LAddr,
MQPWP_Len,
UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(MQPWP_MSG,2000,1))
FROM MQ2PWP where MQPWP_FAddr='NAME_OF_THE_IMPACTED_AGENT'

If MQPWP table is flooded, this can causes the system to be unresponsive (no login anymore possible, no job processing).
It can also be observed that MQOWP table that was flooded or a MQ*CP00* table

This time the problem was detected, traces were activated on WPs, Agent stopped.

Agent Process traces (ucxjlx6_t00.txt) are filled up with such messages

=====================================================
logging =../15 - Logging int Logging interru logging
=====================================================

In PWP trace file (WPsrv_trc_001_00.txt), we detected these kind of suspicious unknown messages.

=====================================================
20200923/094059.546 - process_message_queue(uc4_error_t *) <-- (deadlock or nodata, do it later)
=====================================================

If the agent is stopped in a way or the other, the problem disappear, the impacted queue is emptied within a few minutes and the AE system works then normally again.

Environment

Component: Unix/Linux Agent

Versions affected: 12.2.2 and superior and 12.3.3 and superior

Cause

A problem has been fixed where the Automation Engine becomes slow and unresponsive if a Unix/Linux agent has no free space left to write its agent logs.

Resolution

Workaround:

Identify the rogue agent by consolidating the records per MQPWP_FAddr in the table MQ*PWP (MQ1PWP or MQ2PWP), the majority of these records should come from the agent.
Kill/Stop the rogue agent causing the increase of the records in table MQ*PWP and possible another MQ CP table (ie. MQ2PWP and MQ2CP006). This can be done using the Service Manager command line or the Service Manager Dialog in case the access to the System hosting the Agent is not possible.
Launch a delete statement to remove the associated lines where the MQPWP_FAddr value is equal to the Agent name from the impacted MQ*PWP table, followed by a commit statement ( please open a case with Technical Support to validate the correct queries)

Solution:

Update to a fix version listed below or a newer version if available.

Fix version:
Component(s): Unix/Linux Agent

Automation.Engine 12.2.8 - Available
Automation.Engine 12.3.4HF1 - Available
Automation.Engine 12.3.5 - Available

Additional Information

The bug comes from agent, upgrading the agent is sufficient to fix the problem.

Details of the bug fix:

To avoid flooding the PWP queue with CHGLOGR messages, before sending a change log request the Unix agent now checks the log file descriptor to verify that a log change is possible.

Feedback

Was this article helpful?

thumb_up Yes

thumb_down No