Nas.exe crashes due to ucrtbase.dll and alarms are not getting tickets assigned

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

We performed a workaround by stopping the Nimsoft service, renaming the nas transactionlog.db file in the nas folder and then starting the Nimsoft service and this creates a new transactionlog file and the issue gets resolved.

Symptoms observed:

nas crashing
repeated error in the log: Unable to obtain nimNamedSession for registration to: maintenance_mode
nas and nas GUI response is slow
nas takes time to sync
alarms not showing up in the IM alarm console within a reasonable amount of time
alarms not getting assigned

On checking the Windows event logs (Application) we still found this crash error for the nas.exe:

Faulting application name: nas.exe, version: 0.0.0.0, time stamp: 0x649a9a9d
Faulting module name: ucrtbase.dll, version: 10.0.17763.6189, time stamp: 0xbc3e3f37
Exception code: 0xc0000409
Fault offset: 0x000000000006d288
Faulting process id: 0xe28
Faulting application start time: 0x01db1acb7e74d1df
Faulting application path: C:\Program Files (x86)\Nimsoft Infrastructure\probes\service\nas\nas.exe
Faulting module path: C:\Windows\System32\ucrtbase.dll
Report Id: 5b3e3156-aea8-447f-a895-9946e525924f
Faulting package full name: 
Faulting package-relative application ID:

Environment

20.4 CU8

Cause

multiple

Resolution

1. Downloaded and deployed nas 9.40-T2 since first we noticed that the robot version was 5.70, then we tried using Robot 9.39 and 9.40 but the issues persisted.

We then upgraded the nas to nas-9.40-T2-20231214.zip (it is also attached to this article).

2. To eliminate the error repeated in the log, Unable to obtain nimNamedSession for registration to: maintenance_mode, which is also occurring on other nas instances but causing no trouble. This error may be contributing to the issues only on this nas instance, so we set the following key in the nas section:

maintenance_mode_address = <Nimbus address of the maintenance_mode probe on the Primary hub>

and

mm_timeout_interval = 1

This stopped the errors in the log and this change along with the update of vs2017 to v1.02 n the server and this may also have helped alleviate the nas crashing due to the ucrtbase.dll as we observed after approx. 1 hour. VS errors are often fixed by a reinstall.

The root cause of this issue was apparently that someone accidently installed an old Nimsoft Robot version 5.70 so it was downgraded many versions from its original version 9.39 which could be the root cause for this issue. On this server in particular with nas v9.39/9.40 deployed, the nas was repeatedly crashing due to the need for the maintenance_mode address to be set, as it tries 6 times in a row and then times out and for some reason this contributed to the nas crashing.This makes sure that the nas doesn't have to guess where the maintenance probe is running but nas crashing due to this issue was not occurring on the nother nas instances and is not common.

3. Additionally, update of vs2017 and dropping the nas tables and allowing them to be rebuilt alleviated the struggle of the nas to operate and perform in a reasonable amount of time, nas syncing etc., due to the fact that the tables, especially the nas_transaction_log table was over 10 Billion rows most likely due to the nas 'housekeeping' job not running to completion. Hence it grew very large.

4. After all the changes we made, lastly we opened the nas GUI and in the Status window, rt-click and ran Reorganize database.

- Open the NAS configuration GUI in the Infrastructure Manager

- Click the Status tab

- Right-Click the window pane containing alarm messages inside the nas GUI

- Select Advanced -> Reorganize Database

- You will see this in the nas log: nas: =====[ DATABASE REORGANIZE requested by Administrator from ##.##.###.###/60657

The alarm enrichment and nas probes will stop processing alarms for a bit of time while the database is being reorganized.

They will restart once completed.

Additional Information

nas best practices tips and techniques

Attachments

nas-9.40-T2-20231214.zip get_app