Alarms are not 'in sync' between OC and Infrastructure Manager (IM)
search cancel

Alarms are not 'in sync' between OC and Infrastructure Manager (IM)

book

Article ID: 142778

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM) Unified Infrastructure Management for Mainframe

Issue/Introduction

In the OC alarm viewer, it may appear that nas alarms / total alarm counts are 'out of sync' with the Infrastructure Manager alarm subconsole.

Sometimes the local NAS database becomes 'out of sync' with the backend UIM database tables.

Environment

  • DX UIM 20.4.* / 23.4.*

Cause

There could be 1 or more causes including the size/integrity of the local nas .db files
  • database.db
  • transactionlog.db

or the nas backend database tables

  • nas_alarms
  • nas_transaction_summary
  • nas_transaction_log

or scripts are appending too much data to user_tags.

Resolution

In the Infrastructure Manager (IM), the alarm subconsole shows one set of alarms and OC alarm Viewer is displaying another set of alarms and/or the total alarm count is NOT the same or not close.

First, create an OC Group with a filter of:

'not ip address is null'

Save it, then select the top-level ‘Groups’ icon and click on the Alarms tab to see if the alarm counts are now closer between the nas and IM alarm subconsole.

If the alarm count is not the same or at least close, then take the following steps.

1. Deactivate the alarm_enrichment and nas probe

2. Drop the nas tables listed below from the database using MS SQL Server Management Studio (SSMS)
 
DROP TABLE NAS_VERSION
DROP TABLE NAS_ALARMS
DROP TABLE NAS_TRANSACTION_SUMMARY
DROP TABLE NAS_TRANSACTION_LOG
DROP TABLE NAS_NOTES
DROP TABLE NAS_ALARM_NOTE
 
3. Activate the alarm_enrichment, then the nas probe

When you activate the nas, it should recreate all of the tables and sync them properly with the existing alarms that appear in IM.

If this does not resolve the issue, it is possible that the local NAS database files have become corrupted.   If that is the case, you can take the following steps:

1. Deactivate the alarm_enrichment and nas probe

2. Drop the nas tables listed below.
 
DROP TABLE NAS_VERSION;
DROP TABLE NAS_ALARMS;
DROP TABLE NAS_TRANSACTION_SUMMARY;
DROP TABLE NAS_TRANSACTION_LOG;
DROP TABLE NAS_NOTES;
DROP TABLE NAS_ALARM_NOTE;
 
3. Rename the following files in the <nimsoft>\probes\service\nas directory to something like .db.old in order to preserve them while setting them aside.
 
    - transactionlog.db
    - database.db
    - nisQueue.db (if it exists) 

4. Activate alarm_enrichment and then the nas probe


When you activate the nas, it should recreate those two local files in the nas folder, and all of the tables and sync them properly.

IMPORTANT NOTE:

Renaming the .db files above will cause the complete "reset" of all alarms in the environment.  So, if you have integrated your NAS with external systems/ticketing systems/etc. please be aware that on the next polling cycle, all of the previous alarms that are still active will come in again, which could cause new/duplicate tickets to be created.  Please be aware of this and take steps to mitigate (such as temporarily disabling such integrations until the alarms regenerate.)

If alarm counts are way off between the Infrastructure Manager (IM) and the Operator Console (OC) alatm viewer, 

1. Deactivate nas and alarm_enrichment probes

2. Rename the nas local files named:

   - transactionlog.db

   - database.db

   - nisQueue.db

3. Activate alarm_enrichment

4. Activate nas

5. Refresh IM or Security->Login

6. Checked the alarm counts and after 10 mins or so the counts were very close or the same

 

IF THIS ISSUE REPEATS:

If this issue reoccurs after following the above steps, there could be issues with scripts.  Scripts used to update the user_tag1 or user_tag2 fields may have appended data that exceeds the allowed size in the UIM database.  The UIM database max size for these fields is 255. but the local nas tables may be larger.  When these fields are larger than 255 characters, then then alarm sync to UIM database will fail.  This will cause the OC alarm count to fall behind the IM count.  To resolve this, the cause needs to be prevented and the problematic alarms need to fixed:

  • Edit scripts to prevent the problem from reoccurring:
    • Review scripts to determine which ones may be appending a lot of data to user_tag1 or user_tag2 fields.
    • Fix these scripts to prevent them from adding strings that make the total length exceed 255.
  • Fix existing alarms with user_tag1 or user_tag2 fields exceeding 255 characters.  These alarms need to either:
    • be cleared
      or
    • have the user_tag1/user_tag2 fields edited to be less than 255 characters

 

Additional Information