We have a vmware probe deployed with a template that has set the vmcountactive counter to 30. However, the probe is alerting on the older value of 22.
The alarm being received is as below:
The counter for <device_name>.vmcountactive is outside expected limits (24.0 gt 22.0)
Release : 20.3+
Component : UIM - BASELINE_ENGINE
When a probes template is updated such as vmware or snmpcollector with a new alarm value regardless of static or dynamic, the <baseline engine>/cache_dir/threshold.cache.zip file is updated on all baseline_engines in the upstream path. Sometimes the change does not make it to all baseline_engine(s), and can cause an alarm that is inconsistent with the change and or duplicate alarms.
The <baseline engine>/cache_dir/threshold.cache.zip can be extract on each of the hubs in the QOS path and reviewed for differences for the specific alarm definition. The tile is in a .GZ format, and if changes are made, the file needs to be recompressed using the same method.
Example of two differing values from the two baseline_engine(s) in the path:
{"id":"M0ac7424d5378c994201cxxxxxxxcc644","threshold_id":0,"thresholdType":"static","calcType":"scalar","subsysId":"2.7.3","operator":["undefined","undefined","greaterthan","undefined","undefined"],"levels":[null,null,90.0,null,null],"customAlarmMessage":"","customClearAlarmMessage":"","customAlarmStringSubstitutions":{"descr":"This monitor indicates the memory usage as percentage of total configured or available memory.","unit":"pct","resource":"<MY_DEVICE>","port":"443","host":"<MY_DEVICE>","monitor":"machine.Memory.Memory Usage","key":"\"ESC:VM_MEMORY:<MY_DEVICE>:500d640a-xxxx-252f-26d5-636490017e64:Memory\".\"usage\""},"enabled":true}
{"id":"M0ac7424d5378c994201cxxxxxxxcc644","threshold_id":0,"thresholdType":"static","calcType":"scalar","subsysId":"2.7.3","operator":["undefined","undefined","greaterthan","undefined","undefined"],"levels":[null,null,70.0,null,null],"customAlarmMessage":"","customClearAlarmMessage":"","customAlarmStringSubstitutions":{"descr":"This monitor indicates the memory usage as percentage of total configured or available memory.","unit":"pct","resource":"<MY_DEVICE>","port":"443","host":"<MY_DEVICE>","monitor":"machine.Memory.Memory Usage","key":"\"ESC:VM_MEMORY:<MY_DEVICE>:500d640a-xxxx-252f-26d5-636490017e64:Memory\".\"usage\""},"enabled":true}
From this we can determine which one should be fixed. Perform the following to fix manually:
1. Deactivate the baseline_engine
2. Make a backup copy of <baseline engine>/cache_dir/threshold.cache.zip
3. Use a program such as 7-zip to extract <baseline engine>/cache_dir/threshold.cache.zip. The zip file is actually .GZ format. In Windows, native zip is not able to decompress. Please use .GZ compatible archiver (such as 7-zip) to decompress the file in Windows.
4. Update the desired rows in the problem threshold.cache file
5. Compress the file using 7-zip back into thresholds.cache.zip
6. Replace the old file in <baseline engine>/cache_dir/threshold.cache.zip with the new file.
7. Activate the baseline_engine