UIM - deleted MCS profiles are still creating alarms

book

Article ID: 144709

calendar_today

Updated On:

Products

NIMSOFT PROBES DX Infrastructure Management

Issue/Introduction

We have been implementing MCS profiles for our server groups previously through baselines and TOT.  We switched over to using Operator Console with Alarm Policies and removed all old MCS profiles, did a niscache_clean thru PU and reset_id and restart.  Started fresh with new MCS profiles where we started with CDM TOT of 45 over 60 minutes (old TOT was 30 over 60 minutes).  We are still seeing the old TOT of 30 over 60 in the plug-in metric directory even after moving the directory completely and having it re-created by the system.  

Cause

Robot's plugin_metric.cfg is not updated with the Alarm Policy thresholds

Environment

Release : 9.2.0

Component : UIM - MON_CONFIG_SERVICE 9.20hf2

Resolution

1. Delete the plugin_metric.cfg file on the suspect robot(s)
Note: This step is not always needed and can be skipped to validate if issue is resolved without it.

2. Update robot 9.20 to 9.20HF7+

3. (Optional) Confirmed that default plugin_metric.cfg is created

4. Delete the suspect robot from USM Inventory

5. Wait for suspect robot to populated back to USM Inventory and Monitoring profiles will redeploy after the robot is displayed in the group again.

6. (Optional) Run the following query in SQL Studio to check the profiles for suspect robot status changes from from New to OK
select
(select top(1) objectvalue from SSRV2AuditTrail where objecttype = 'profile' and objectid = dp.profileId order by timestamp desc) as reason, 
d.name as device_name, 
d.status as device_status,
r.address as robot_address, 
r.robot_active,
r.status as robot_status, 
dp.status,
dp.*
from SSRV2Profile dp
join ssrv2device d on d.cs_id = dp.cs_id
join cm_device cd on cd.cs_id = dp.cs_id
join cm_nimbus_robot r on r.dev_id = cd.dev_id
and d.name like  '%suspectrobotname%';

7. (Optional) Check the suspect robot's plugin_metric.cfg  and verify its updated (profile status is shown with 'OK' in above query) with Alarm Policy thresholds settings.

Open the plugin_metric.cfg  and file should have sections with  <policy_#>

Example:
<policy_6>
      <cdm>
         <metric_8>
            metric_type_id = 1.5:1
            metric_precedence = 100
            qos_target = %suspectrobotname%
            qos_source = ~.*
            qos_name = QOS_CPU_USAGE
            policy_id = 6
            alarm = true
            <alarms>......

Note: If the plugin_metric.cfg contains "mcs_profileid=" entries, this indicates that the Server Group polices (Non Policy- OLD)  profiles are still present and step one cannot be skipped
   
8. Wait for various poll cycles and confirm if  alarms are generated based of the Alarm Policy TOT thresholds settings