search cancel

Agents go to Service Down status after running SYSTEM Process Flow

book

Article ID: 113391

calendar_today

Updated On:

Products

CA Automic Applications Manager (AM)

Issue/Introduction

All Agents including the Master's local Agent will go to Service Down status daily or weekly around the same time of the day. Only the Automation Engine remains in a Running status.

Further investigation show that the issue occurs during the running of SYSTEM Process Flow, and simply restarting the RMI process or Windows Service will allow all Agents to go back into a Running status.

Environment

Release:
Component: APPMGR

Cause

This can occur if Applications Manager database tables that are managed by SYSTEM Process Flow, contain an unusually large amount or records.

The SO_PRINT_LOG and SO_JOB_HISTORY tables are generally more susceptible to have extra data if improperly maintained.

SYSTEM scripts query a number or tables using one of 7 available available Master Socket Manager (MSM) threads within the RMI Java process.

The MSM threads are responsible for processing requests made from the local Agent and remote Agents such as Subvar resolution, Condition evaluation, etc.

Due to the large amount of data that is queried, the MSM threads are unable to process any Agent request while it is waiting for the a query to return, resulting in Agents going to a Service Down status due to timeouts.

 

Resolution

To temporarily resolve the issue, restart the RMI process or Windows Service.

To permanently resolve the issue:

Refer to Database Administrator to review the SO_PRINT_LOG and SO_JOB_HISTORY table.

For the SO_PRINT_LOG table, delete all records older then 7 days which is the default retention day value for Jobs.

For the SO_JOB_HISTORY table, delete all records older then 60 days or the value set for the Job HISTORY_PURGE.