UIM - unexpected alarm behaviour from probe profiles or templates
search cancel

UIM - unexpected alarm behaviour from probe profiles or templates

book

Article ID: 5148

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM) Unified Infrastructure Management for Mainframe

Issue/Introduction

1) I added a robot and/or I am getting alarms that are not configured for this probe.

or

2) I have alarms enabled in my template but they are not being activated, only QOS.

Environment

- Any supported version of UIM.

Cause

Each instance of baseline_engine keeps its own database in <baseline_engine>/cache_dir/threshold.cache.zip.

Baseline engine is what actually decides if an alarm should be sent. 

This database can get out of sync and it will not update from the template. 

There is currently no resolution for baseline_engine for this issue to prevent it from happening in the future  

Here is a scenario of how this can happen:

You have a robot with cdm installed and you configure time to threshold(TTT).
You then decide you no longer want a robot on this system and delete it.
Some time later, someone installs a robot on this system and installs cdm, without time to threshold.
You will start getting TTT alarms because baseline engine still has the original settings. 

There is a manual way to update the baseline engine database to correct this.

Use Option 3 to resolve this in the example scenario of how this can happen.  

Resolution

To resolve the issue were the alarm is not being turned on from the snmpcollector template, you have 3 options: 

Option 1
the easy way. (If you are not using time to threshold) 
1) turn off snmpcollector/pollagent and baseline_engine 
2) delete <baseline engine>/cache_dir/threshold.cache.zip 
3) turn on baseline engine, pollagent and snmpcollector. 
4) force rediscovery on the profile(s)
5) restart robot so each probe can send its configuration to baseline engine.


Option 2 
If you are using time to threshold: 
(If you only have a few devices that use time to threshold and you do not mind having to recreate the device profile.) 
1) turn off snmpcollector/pollagent and baseline_engine 
2) delete <baseline engine>/cache_dir/threshold.cache.zip 
3) turn on baseline engine, pollagent and snmpcollector. 
4) force rediscovery on each profile. 
5) setup the time to threshold for any device that needs it using admin console. 
6) restart the robot so each probe can resend its configuration to baseline engine.

Option 3 
The hard way (if you are using time to threshold and there are too many devices to recreate
1) turn off snmpcollector/pollagent and baseline_engine 
2) copy <baseline engine>/cache_dir/threshold.cache.zip to back it up
3) use a program such as 7-zip to extract <baseline engine>/cache_dir/threshold.cache.zip. The zip file is actually .GZ format. In Windows, native zip is not able to decompress. Please use .GZ compatible archiver (such as 7-zip) to decompress the file in Windows. 
4) query the database for the following using your preferred tool. 
    Note: This is a generic query that will work on any version and supported brand of database. Feel free to create your own query based on the type of database you use (MS_SQL, MYSQL or Oracle). 
       select cs_id from cm_computer_system where ip = 'the device ip'; 
       select ci_metric_id from cm_configuration_item_metric where cs_id = 'cs_id from above'; 
5) find the metric id in the extracted file and delete that line
6) do steps 4 and 5 for each device not showing alarms
7) compress the file using 7-zip back into thresholds.cache.zip 
8) replace the old file in <baseline engine>/cache_dir/threshold.cache.zip with the new file. 
9) restart baseline_engine, pollagent, snmpcollector 
10) force rediscovery for each device