search cancel

nas sync issues due to very high alarm counts

book

Article ID: 203532

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

Alarm counts are extremely high. How do I reduce the alarm counts?

Alarm counts are higher than expected for individual alarms.

252k 
104k
52k

DXIM9.20

NAS 9.20

DXIM robot 9.20HF15

Environment

Release : 9.2.0

Component : UIM NAS

Cause

- high alarm counts

Resolution

In general, UIM Administrators must do their best to reduce alarm counts/alarm noise/unnecessary alarms.

Potential symptoms:

- Unexpected alarm behaviour, delays or timestamps
- nas GUI sync takes several seconds but may get worse over time due to the number of alarms in the database.db
- nas rules not processing in time or as expected
- Primary hub nas overloaded
- nas Status window doesn't show the alarms, needs to be refreshed. Remains empty at first until you click the refresh button.

database.db is large for example-> ~1.5 GB
nas transactionlog.db for example-> ~600KB

Monitoring Governance/Alarm Reduction

It is tempting to enable/'turn on' a lot of monitoring when you first deploy UIM but over time this can cause havoc. Best Practice is to only enable alarm thresholds for Key Performance Indicators (KPIs), that are associated with an upstream effect on business in some way. Try starting with a maximum of 5 KPIs per application/technology. Ask Support if we have any suggested KPIs for vmware, Citrix, Netapp, Nutanix, Exchange, etc. That is the base starting point - a small number of KPIs (key metrics). Always ask, why it’s important to collect the data (QOS or alarms), how often it needs to be collected and why, and how long it should be stored. Keep all of these monitoring aspects to a minimum. Note that some probes have monitoring enabled 'right out of the box' for many metrics - but customers must always decide which ones MUST be kept enabled versus the nice-to-have's.

UIM suppresses like alarms and updates the alarm count. But should we keep generating alarms in the hundreds/thousands letting the alarm counts increase exponentially? This is not a good practice as this can adversely affect nas performance, scalability and nas housekeeping (maintenance) as well as alarm displays and reliability, not to mention use of system resources as well. For any alarm suppression counts > 100, it begs the question, will the nas be able to handle more and more alarms with higher and higher counts? (not without running into performance/display or nas sync issues and even delays in processing), or display issues in the alarm view. You should take the necessary time to review and adjust monitoring policy and process for alarms and related ticket handling.

Why allow this to happen if no actions will be taken to alleviate the issue/resolve the problem? For example, of what use is it to have 8000+ Robot Inactive alarms continuously being increased when nothing is being done about it. Its just noise and it places an unnecessary load on the environment which usually worsens over time.