NAS stopped on secondary hub

book

Article ID: 185327

calendar_today

Updated On:

Products

NIMSOFT PROBES DX Infrastructure Management

Issue/Introduction

UIM 9.20

Have 2 secondary hubs running SNMP Collector (4.03) for a long time and today one of them had the NAS probe error & it won't start.

Feb 28 11:41:29:003 [4500] 0 nas: maint:  Unable to obtain nimNamedSession for registration to: maintenance_mode

Feb 28 11:43:29:044 [4500] 0 nas: maint:  Unable to obtain nimNamedSession for registration to: maintenance_mode

Feb 28 11:45:34:085 [4500] 0 nas: maint:  Unable to obtain nimNamedSession for registration to: maintenance_mode

Feb 28 11:47:39:129 [4500] 0 nas: maint:  Unable to obtain nimNamedSession for registration to: maintenance_mode

Feb 28 11:48:31:121 [4440] 0 nas: Unable to connect to distsrv, retrying...

Feb 28 11:49:44:156 [4500] 0 nas: maint:  Unable to obtain nimNamedSession for registration to: maintenance_mode

Feb 28 11:51:49:192 [4500] 0 nas: maint:  Unable to obtain nimNamedSession for registration to: maintenance_mode

Feb 28 11:53:54:223 [4500] 0 nas: maint:  Unable to obtain nimNamedSession for registration to: maintenance_mode

Feb 28 11:55:59:268 [4500] 0 nas: maint:  Unable to obtain nimNamedSession for registration to: maintenance_mode

Feb 28 11:58:04:285 [4500] 0 nas: maint:  Unable to obtain nimNamedSession for registration to: maintenance_mode

Feb 28 12:00:09:339 [4500] 0 nas: maint:  Unable to obtain nimNamedSession for registration to: maintenance_mode

Feb 28 12:02:14:370 [4500] 0 nas: maint:  Unable to obtain nimNamedSession for registration to: maintenance_mode

Feb 28 12:03:31:302 [4440] 0 nas: Unable to connect to distsrv, retrying...

Feb 28 12:04:19:402 [4500] 0 nas: maint:  Unable to obtain nimNamedSession for registration to: maintenance_mode

Feb 28 12:06:24:434 [4500] 0 nas: maint:  Unable to obtain nimNamedSession for registration to: maintenance_mode

Feb 28 12:08:29:473 [4500] 0 nas: maint:  Unable to obtain nimNamedSession for registration to: maintenance_mode

Feb 28 12:09:31:484 [4440] 0 nas: NAS Services called using mode: 0

Feb 28 12:09:32:364 [4440] 0 nas: Unable to obtain session for alarmEnrichmentTerminate.  attempts: 1/3

Feb 28 12:09:32:866 [4440] 0 nas: Unable to obtain session for alarmEnrichmentTerminate.  attempts: 2/3

Feb 28 12:09:33:368 [4440] 0 nas: Unable to obtain session for alarmEnrichmentTerminate.  attempts: 3/3

Feb 28 12:09:33:869 [4440] 0 nas: maint Maintenance Mode Terminated

Feb 28 12:09:35:079 [4440] 0 nas: maint Maintenance Mode Destroyed

Feb 28 12:09:35:304 [4440] 0 nas: NAS Terminated.

 

I restarted the Robot on both the Primary Hub & the SNMP Collector Hub. Still NAS won't start on that SNMP Collector Hub server.


NOTE: I am remote today & reachable at (516) 273-5035

Cause

The file to TOT was corrupted and Alarm_enrichment would not start.
The NAS requires that the alarm_enrichment probe be up and working before it will start.
When it failed to contact the alarm_enrichment probe 3 times it shut down.

Environment

UIM 8.X, 9.5

NAS: ALL versions

Resolution

1) deactivate Alarm-enrichment and NAS
2) make a backup copy of the nas folder
3) Delete the NAS probe and the alarm enrichment probe.
4) make sure to remove the NAS folder from the file system
5) deploy a new clean version of nas and test

Optional:
Restore the nas.cfg from back up and the nas\scripts folder if needed.

NOTE:
Installing the NAS probe will install Alarm_enrichment. There is not seperate probe for Alarm_enrichment

Additional Information

From the alarm_enrihcment log
Feb 28 14:57:14:026 [main, alarm_enrichment] java.lang.AssertionError: data were not fully read, check your serializer 
 at org.mapdb.Store.deserialize(Store.java:299)
 at org.mapdb.StoreDirect.get2(StoreDirect.java:486)
 at org.mapdb.StoreDirect.get(StoreDirect.java:439)
 at org.mapdb.Caches$HashTable.get(Caches.java:246)
 at org.mapdb.EngineWrapper.get(EngineWrapper.java:58)
 at org.mapdb.HTreeMap$HashIterator.findNextLinkedNodeRecur(HTreeMap.java:1089)
 at org.mapdb.HTreeMap$HashIterator.findNextLinkedNode(HTreeMap.java:1054)
 at org.mapdb.HTreeMap$HashIterator.advance(HTreeMap.java:1041)
 at org.mapdb.HTreeMap$HashIterator.moveToNext(HTreeMap.java:1000)
 at org.mapdb.HTreeMap$EntryIterator.next(HTreeMap.java:1147)
 at org.mapdb.HTreeMap$EntryIterator.next(HTreeMap.java:1140)
 at com.nimsoft.probe.service.nas.timeoverthreshold.TOTAlarmCandidateTable.importDataFromAlarmCandidatesPersistentMap(TOTAlarmCandidateTable.java:80)
 at com.nimsoft.probe.service.nas.timeoverthreshold.TOTAlarmCandidateTable.<init>(TOTAlarmCandidateTable.java:73)
 at com.nimsoft.probe.service.nas.timeoverthreshold.TOTAlarmCandidateTable.getInstance(TOTAlarmCandidateTable.java:57)
 at com.nimsoft.probe.service.nas.timeoverthreshold.TimeOverThresholdService.<init>(TimeOverThresholdService.java:78)
 at com.nimsoft.probe.service.nas.Nas.postLoginInitialization(Nas.java:488)
 at com.nimsoft.probe.service.nas.Nas.login(Nas.java:418)
 at com.nimsoft.nimbus.NimProbeBase.nimAttachProbe(NimProbeBase.java:303)
 at com.nimsoft.nimbus.NimProbe.initSubscribe(NimProbe.java:572)
 at com.nimsoft.nimbus.NimProbe.doForever(NimProbe.java:416)
 at com.nimsoft.probe.service.nas.Nas.run(Nas.java:611)
 at com.nimsoft.probe.service.nas.Nas.main(Nas.java:394)

Feb 28 14:57:14:456 [2376] Controller: Max. restarts reached for probe 'alarm_enrichment' (command = <startup java>)