NAS nisqueue.db grows intermittently and alarms are delayed in OC
search cancel

NAS nisqueue.db grows intermittently and alarms are delayed in OC

book

Article ID: 269301

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

We have replicated an old UIM 20.4 Environment into a new one, new servers, new DB.

Since then, intermittently, nisqueue.db grows up to 100MB/200MB or similar and creates delays in writing the alarms to the DB.

When this happens, the NIS-Bridge speed in ms of the inserts increases from around 10ms to even a few seconds.

Alarm delay can go to NULL to some around 20 minutes or even 1 hour.

Alarm throughput seems to be higher than normal.

On the old environment message inserts are 1ms:

Indexes exist similarly on both systems.

In the new system inserts are between 2ms and 10ms and then they go up to few seconds too.

The following Actions did not help fix the issue:

Actions Taken with no improvement:

1. Dropped NAS TABLES AND restarted the NAS

2. There was a suspicion that the EMS probe would be involved as when deactivating the EMS probe the queue was stable yesterday. 

3. We have checked the fragmentation of the NAS_Alarms table which was 94% on < 20k rows which is fine.  The NAS_transaction_log table is much larger (as expected) and has a fragmentation of 99% which again is fine

4. There is a full backup of the database taken every night, to a network location and this is successful.

5. We have applied the suggestion from the KB: The nisqueue.db file size grow (broadcom.com)

The maxdop for this customer is set to 8, similar to the old Environment. What they changed is the near Cache value from 5 to 50. This has helped a bit but the issue continues. 

6. Rebuilt the indexes of NAS tables except nas_transaction_log

7. Changed the High Availability Always on replication mode from Synchronous to Asynchronous

8. Adjusted the retention settings for nas_transaction_log

Environment

  • UIM 20.4 CU5
  • 20+ HUBs
  • MS SQL Server Enterpise
  • Meeting and Over minimum hardware requirements

Cause

  • DB Related Issues. The latency on the storage (for the mdf file) might be averaging high (= or >60ms). This could cause issues with writing to the storage.

 

Resolution

  • Engage your DBA!

  • Based on the information in this KB, apply the appropriate actions if the database performance is adversely affecting the insertions to the DB

  • DX UIM Best Practice for the UIM database is to use Tier 1 storage

    Tier 1
    Tier 1 includes fast disks, all-flash storage, hybrid flash storage. 
    Use Tier 1 for mission-critical or highly sensitive files. 

    Tier 2/3
    Tier 2 and Tier 3 include Slow-spinning HDD, disk-based backup appliance, cloud storage, tape.

Additional Information

Known Scenario: The DB Server might be running on a storage that is located within a VSAN which is meant for application servers and that is normally too slow for DB Servers.  DB servers should run on faster storage (example VNX).

Related KBs:

UIM Why do the UIM database tables become so fragmented? (broadcom.com)

The nisqueue.db file size grows (broadcom.com)