How to minimize or stop Robot inactive alarms when they are too frequent
search cancel

How to minimize or stop Robot inactive alarms when they are too frequent

book

Article ID: 139793

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) Unified Infrastructure Management for Mainframe CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

  • Customers may need to adjust the related timing settings for when to receive Robot inactive alarms, for example when Robot inactive alarms are too frequent.
  • Also what is the time interval or delay for a Robot inactive alert?

Environment

All robot and hub versions

Cause

  • Too many Robot Inactive alarms
  • Receiving robot inactive alerts on a server too frequently
  • Need to delay or stop Robot inactive alerts

Resolution

Robot <hostname> is inactive alarms are generated and controlled by the hub probe, via the hub_update_interval setting in the robot (controller) probe.

Example

The default is set to every 15 minutes (900 seconds).

Robot inactive alarms are raised based on the calculation/condition below:

robot_missed_update_count * hub_update_interval  + 120 secs

The default value for robot_missed_update_count (set in the hub.cfg) is 2.

The default value for hub_update_interval (set in the robot.cfg) is 15 minutes or 900 secs.

When the end user does not specifically set these values, default values remain in effect and are evaluated as ->

    2 * 900 secs + 120 secs = 1920 secs which is 32 minutes

If the end user wants to raise the alarm(s) earlier, or less frequently, one method of doing so is changing the robot_missed_update_count.

robot_missed_update_count, specifies how many consecutive missed updates from the robot MUST occur before the hub begins issuing Robot inactive alarms.

The robot_missed_update_count in hub.cfg can be set lower, e.g., to 1, which will reduce the robot inactive alert delay to 1 * 900 secs + 120 secs = 1020 secs.

This should be tested on 1 or more robots that are currently sending Robot inactive alarms frequently.

Try setting the robot_missed_update_count to 3 as a start and see if the alarm frequency drops or the alarms are eliminated.

The parameter robot_missed_update_count was introduced in hub v7.91 GA so it applies to hub v7.91 or higher.

In large environments, to update the hub_update_interval for a large number of robots, a robot_update configuration package can be created and distributed, for example:


1. Drag and drop the controller probe into the local archive on the Primary hub and Rename the configuration, e.g., ‘robot_update_hub_update_interval’.
2. Rt-click and Edit file... -> then remove all other entries in the robot.cfx except the setting you want to change -> hub_update_interval = 300


Note: for step 1, choose a robot that reports to the primary hub even if you intend to configure robots under secondary hubs.  The package for robot.cfx will only be built correctly if the "source" robot is reporting to the primary hub.  This package can then be distributed to any robot under any other hub, but the initial creation must use a robot under the primary hub as the base robot for package creation.


example:


<controller>
overwrite

   hub_update_interval = 300

</controller>


IMPORTANT:

  • If you want the hub_update_interval setting to be the same for every robot within your UIM environment, then a simple configuration package can be created, and a single drag and drop of the package to the UIM domain level will deploy it.
  • Note that the operation will, however, tie up the distsrv probe for a significant amount of time (and the Infrastructure Manager (IM) being used as well.)
  • It could also be deployed via a distribution task that is scheduled for ‘non-business’ hours.

Additional Information

Important Note: The above settings only affect the first instance of a robot inactive alarm, i.e. how long it takes between the time the robot goes down and the alarm is sent by the hub.

While the robot is down, additional instances of the alarm (increasing the 'count' of the alarm) will be sent every 1 minute until the robot is back up and running. This is hard coded and cannot be configured.

Note: robot_missed_update_count is NOT present in the .cfg by default. 

To add the parameter:

  1. open hub raw configure
  2. select the 'hub' Section
  3. click New Key
      name: robot_missed_update_count
      value: <enter the desired value>
  4. click Apply and then click OK

Other possible causes of Robot inactive alarms:

  • network latency
  • network down (between hub and robot)
  • hub (48002) or robot ports (48000) are blocked
  • robot service is not up/running
  • robot service is hung
  • robot has lost its network route/path to the hub
  • robot has been decommissioned on purpose
  • robot/controller cfg is corrupted, e.g., missing <sections>, no "<hdb>" start section, etc.