search cancel

Monitor a server's (robot) power off or unreachability and send a Nimbus email alert

book

Article ID: 241721

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM) Unified Infrastructure Management for Mainframe

Issue/Introduction

Our customer has been requested us to monitor a server's (robot) power off or unreachability and auto-send a Nimbus email alert to their related team. What would be the best practice for me to create this monitor through the Infrastructure Manager? Also, my customer lists some hundreds of servers. Would this monitoring affect the overhead of Nimbus or Network/Database traffic?

Cause

- Guidance

Environment

Release : 20.3 or higher

Component : UIM - ROBOT

Resolution

Normally, Availability measures system 'uptime' and Reachability measures device connectivity.

Availability is the percentage of time that the device is powered on and also capable of processing data. A device that is 'Available' might still be unreachable because of a network or communications failure by another device.

Reachability refers to whether a device is reachable from the source. Typically, data sources use ICMP (ping testing) to communicate regularly with the target device. Any communication failures, including the loss of the network path or routing, affect the reachability statistics. If ICMP is blocked, and you cant use the net_connect probe to ping the device, you can use the snmpcollector probe determine reachability.

Reachability data comes from regular ping testing of all devices that support ICMP. A reachability value can be the percentage ping responses that are received from the device during each reporting interval. You can use net_connect or icmp for ping monitoring.

UIM Availability (via QOS_POWER_STATE)

Note that QOS_POWER_STATE data is collected for hubs/robots by default and should not be changed/disabled.

QOS_POWER_STATE is a QOS sent by the robot to help you to generate reports and its used for availability calculation.
Runs every 5 minutes and values are collected as 0's and 1's.

Default values are 0 for down and 1 for Up.

How are Reachability and Availability in snmpcollector calculated?
https://knowledge.broadcom.com/external/article/13270/

Availability Reports (OOTB Reports but requires cabi_bundled)
https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/ca-unified-infrastructure-management-probes/GA/dashboards/ca-business-intelligence-dashboards/library-of-ca-business-intelligence-reports-for-ca-uim/availability-reports.html

Community Post: (unsupported, custom scripts/callbacks)

robots_checker
UIM robots_checker (check probes, and do callbacks on it). This probe has been created to do self-monitoring of UIM Hubs and robots.
 
net_connect and icmp can send alarms based on the monitoring results. snmpcollector can also be configured to send alarms.

In a properly sized environment, this will not have any adverse affect on overhead of Nimbus or traffic. For more information on UIM Sizing Requirements please refer to:

UIM Sizing Requirements