Multiple Robot Inactive alarm messages
search cancel

Multiple Robot Inactive alarm messages

book

Article ID: 207380

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) DX Unified Infrastructure Management (Nimsoft / UIM) Unified Infrastructure Management for Mainframe CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

  • We are having a problem with multiple alarm "Robot inactive" alarms created at the same time causing alert flooding.

  • Are there any solutions so that robot inactive alarm only sends 1 alert to avoid flooding on our alarm server.

  • We also see that we have some false Robot Inactive alarms.

Environment

  • Release: DX UIM 23.4.x or higher
  • Component: UIM - ROBOT

Cause

Potential causes of a flurry or storm of robot inactive alarms:

  • robot's parent hub is unavailable
  • network latency
  • network down (between hub and robot)
  • hub (48002) or robot ports (48000) are blocked
  • robot service is not up/running
  • robot service is hung
  • robot has lost its network route/path to the hub
  • robot has been decommissioned on purpose
  • robot/controller cfg is corrupted, e.g., missing <sections>, no "<hdb>" start section, etc.
  • When a robot with cdm probe installed and configured is being rebooted, it will generate a critical severity alarm like the following: "Critical" alarm "robot is inactive hub alarm" this occurs when a robot is inactive for "Hub update interval" x 1.5 times

Resolution

First, to avoid any unusual/unexpected issues, please make sure that the Hub<->robot combinations are the same version/compatible versions and run the latest hub and robot version for your UIM version.

Second, check and understand the hub configuration settings in the following KB article:

How to minimize or stop Robot inactive alarms when they are too frequent

robot_missed_update_count specifies how many consecutive missed updates from the robot must occur before the hub begins issuing robot is inactive alarms and it does not affect any other alarms.
 
Have the robots possibly been decommissioned?
 
We recommend cleanup of the decommissioned robots by following the instructions in Cleanup Inactive Robots

Is the Robot service actually running on these machines?

Third, in any case for now, if you need to temporarily suppress the alarms, you have a few options:

1. Suppress the alarms via the nas Auto Operator. Create a profile with message filter of /.*Robot <hostname> is inactive.*/ or for multiple robots use:

    /(.*Robot.*)(.*is inactive.*)/

and list the robots in the Robot field of the AO profile separated by a pipe symbol.

   mxxxxxxxxxx0x|xxxxxxxxxx|etc

Use an Action of CLOSE.


Lastly, you may choose to:

2. Exclude (delete) the alarms via nas preprocessing rule using similar filtering n the message and the sources of the alarms.