Alert sent for device in maintenance when UIM server was rebooted
search cancel

Alert sent for device in maintenance when UIM server was rebooted

book

Article ID: 200351

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM) Unified Infrastructure Management for Mainframe

Issue/Introduction

Observed that when maintenance was performed for UIM servers, alarms for other servers that were under maintenance will be sent out when the primary hub is up. It seems like the alarm already came in while the maintenance_mode probe was still querying the database for the configured maintenance windows. Is there any way to prevent or resolve this issue?

Environment

  • Release: 20.1
  • Component: UIM MAINTENANCE MODE
  • maintenance_mode probe

Resolution

Please review the KB Article:

When the UIM service in the Primary HUB Robot is restarted, we saw alarms from maintenance mode device.
https://knowledge.broadcom.com/external/article?articleId=6989

You can also try increasing the timeouts in the ems and nas probe as described below:

1) ems

In <setup> section:

Add the key  maintenance_mode_cmd_timeout = 300000

if it does not exist. The value is in ms.


2) nas

in raw configuration or cfg file:

in <setup> section:

add the key

maint_max_resp_time = 300

if it does not exist. The value is in seconds.


3) Also in the nas <setup> section 

add the key:

registrationIntervalLookAheadMinutes = 60

Additional Information

registrationIntervalLookAheadMinutes
This key determines how often the nas will attempt to re-register with the maintenance_mode probe.

This key is set to 30 minutes by default, but if you have a large number of active schedules defined which do not get changed frequently, then you can increase the re-registration interval as well to 60 or even 90 minutes.
 
maint_max_resp_time
This setting can be added/configured to override the default timeout period. The default period is 20 seconds for re-registration with the maintenance_mode probe. This setting fixes an issue where the nas intermittently alarms through for devices that are in maintenance when the maintenance_mode probe fails to respond to the re-registration request from the nas within the default timeout period.