search cancel

Central Server impacted, NAS unable to create connections

book

Article ID: 201662

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

Central Server impacted, NAS unable to create connections. Please help immediately.

All nas tables were empty when queried.

- nas_alarms
- nas_transaction_summary
- nas_transaction_log

No rows at all in the nas tables.

CPU on the primary hub was spiking from 80-100%

hub and controller were crashing.

nas queue showed NO 'Id' field value in the hub Status window.

Faulting application name: hub.exe, version: 0.0.0.0, time stamp: 0x5f3a8dc5
Faulting module name: ntdll.dll, version: 6.1.7601.24000, time stamp: 0x5a499ad2
Exception code: 0xc0000005
Fault offset: 0x0000000000032964
Faulting process id: 0x2e44
Faulting application start time: 0x01d6a3e5d5692fa8
Faulting application path: E:\Program Files (x86)\Nimsoft\hub\hub.exe
Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
Report Id: fb19ef17-0fd9-11eb-aa0e-005056b0128c

Reset of nas had no affect.

nas.log showed only some suspicious errors:

Oct 16 23:45:32:603 [3404] nas: [0x00D277A0] Database provider is 'SQLITE3', database 'transactionlog.db'
Oct 16 23:45:32:608 [3404] nas: finish called on DB connection 0000000000D27570, but a batch has not been started
Oct 16 23:45:32:610 [3404] nas: NiS bridge: synchronized alarm transactions in 7ms
Oct 16 23:45:32:610 [3404] nas: nisInitialize completed
Oct 16 23:45:32:610 [3404] nas: NiS bridge started...
Oct 16 23:45:32:661 [3404] nas: _nisCreateDevCsMap: ending batch of 0 operations
Oct 16 23:45:32:661 [3404] nas: finish called on DB connection 0000000000D27570, but a batch has not been started
Oct 16 23:45:33:730 [3404] nas: CreateDevCS map used 1068ms
Oct 16 23:45:33:730 [3404] nas: nisRun: finishing batch before NTL cleanup operations
Oct 16 23:45:33:730 [3404] nas: finish called on DB connection 0000000000D27570, but a batch has not been started
Oct 16 23:45:33:734 [3404] nas: Nis-Bridge: Transaction-log administration succeeded deleting 1000 transaction entries older than 30 days.
Oct 16 23:45:33:734 [3404] nas: nisRun: finishing batch before NTL compression operations
Oct 16 23:45:33:734 [3404] nas: finish called on DB connection 0000000000D27570, but a batch has not been started
Oct 16 23:45:33:738 [3404] nas: Nis-Bridge: Transaction-log administration succeeded compressing 1000 transaction entries older than 7 days.
Oct 16 23:45:33:738 [3404] nas: nisRun: finishing batch before NTS cleanup operations
Oct 16 23:45:33:738 [3404] nas: finish called on DB connection 0000000000D27570, but a batch has not been started
Oct 16 23:45:33:739 [3404] nas: NiS-Bridge: Transaction-log administration used 8ms
Oct 16 23:47:32:006 [3404] nas: _nisCreateDevCsMap: ending batch of 0 operations
Oct 16 23:47:32:006 [3404] nas: finish called on DB connection 0000000000D27570, but a batch has not been started
Oct 16 23:47:36:740 [10040] nas: sockClose:0000000005390080:10.112.10.90/60375
Oct 16 23:47:36:743 [10040] nas: maint: MM State HEALTHY Maintenance Mode Heartbeat SUCCESS

Restart of machine had no affect.

CPU was spiking, pinned at times, 80 to ~100%. System was jacked up acting inconsistently from a performance perspective.

Then later, after some time passed, all of a sudden the CPU issue disappeared after approx 1.5-2 hours trying to work on the system.

OS team could not find anything wrong with the machine.

DBA could not find anything wrong with the DB and saw no jobs taking up significant resources.

Cause

- Unknown

Environment

Release : UIM 9.02

OS: Windows 2008 R2.

Component : UIM NAS

nas 9.06/9.06_HF6

Resolution

1. Deactivated nas and AE
2. Saved off the following files for later restore:

- nas.cfg
- alarm_enrichment folder
- scripts folder including Backup folder
- database.db

3. Deleted nas and AE probes. (Had to wait for the nas to lose its port and finish deleting)

4. Then we did a clean install of the nas and the nas queue was finally working as expected and showed a value of 'nas' in the ID field with the NAS address and an Established connection. Then we stopped the probes then restored the important nas files into the nas folder.

5. Activated nas and AE and rechecked the queue and the nas was processing messages very quickly in the thousands of messages.

Note that I also reset the bulk size in the hub postroute section from 999, to empty value.