unable to login to uim server
search cancel

unable to login to uim server

book

Article ID: 134532

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

Results: unable to login to uim server  and cannot monitor, also Ump is down.


ems exception


Jul 11 12:09:04:659 [attach_socket, ems] Dispatching to / to api/nas/alarms

Jul 11 12:09:14:042 [attach_clientsession, ems] Exception in NimServerSessionThread.run. Closing session.

Jul 11 12:09:14:043 [attach_clientsession, ems] (2) communication error, I/O error on nim session (S) com.nimsoft.nimbus.NimServerSession(Socket[addr=/10.30.230.4,port=48002,localport=52568]): Connection reset

 at com.nimsoft.nimbus.NimSessionBase.recv(NimSessionBase.java:944)

 at com.nimsoft.nimbus.NimServerSession.recv(NimServerSession.java:90)

 at com.nimsoft.nimbus.NimServerSession$NimServerSessionThread.handleMessage(NimServerSession.java:154)

 at com.nimsoft.nimbus.NimServerSession$NimServerSessionThread.run(NimServerSession.java:123)

Caused by: java.net.SocketException: Connection reset

 at java.net.SocketInputStream.read(SocketInputStream.java:209)

 at java.net.SocketInputStream.read(SocketInputStream.java:141)

 at java.net.SocketInputStream.read(SocketInputStream.java:223)

 at com.nimsoft.nimbus.NimSessionBase.readNimbusHeader(NimSessionBase.java:1077)

 at com.nimsoft.nimbus.NimSessionBase.recv(NimSessionBase.java:883)

 ... 3 more


Jul 11 12:09:14:043 [attach_clientsession, ems] Exception in NimServerSessionThread.run. Closing session.

Jul 11 12:09:14:044 [attach_clientsession, ems] (2) communication error, I/O error on nim session (S) com.nimsoft.nimbus.NimServerSession(Socket[addr=/10.30.230.4,port=48002,localport=52429]): Connection reset

 at com.nimsoft.nimbus.NimSessionBase.recv(NimSessionBase.java:944)

 at com.nimsoft.nimbus.NimServerSession.recv(NimServerSession.java:90)

 at com.nimsoft.nimbus.NimServerSession$NimServerSessionThread.handleMessage(NimServerSession.java:154)

 at com.nimsoft.nimbus.NimServerSession$NimServerSessionThread.run(NimServerSession.java:123)

Caused by: java.net.SocketException: Connection reset

 at java.net.SocketInputStream.read(SocketInputStream.java:209)

 at java.net.SocketInputStream.read(SocketInputStream.java:141)

 at java.net.SocketInputStream.read(SocketInputStream.java:223)

 at com.nimsoft.nimbus.NimSessionBase.readNimbusHeader(NimSessionBase.java:1077)

 at com.nimsoft.nimbus.NimSessionBase.recv(NimSessionBase.java:883)

 ... 3 more


AND...


Caused by: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'dataSource' defined in class path resource [com/nimsoft/events/common/config/DataAccessConfig.class]: Unsatisfied dependency expressed through method 'dataSource' parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'databaseConnectionInfo' defined in class path resource [com/nimsoft/events/common/config/DataAccessConfig.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.nimsoft.nimbus.lookup.model.DatabaseConnectionInfo]: Factory method 'databaseConnectionInfo' threw exception; nested exception is (4) not found, Received status (4) on response (for sendRcv) for cmd = 'nametoip' name = '/hms/Primary_Hub/WPAPPUIM001/data_engine'

Environment

Release : 9.0.2

Component : UIM - HUB

- UIM 9.0.2

- spectrumgtw 8.67

- ems 10.2.1 HF1

- udm_manager 9.0.2

- discovery_server 9.0.2

Cause

- Not clear but it seems as if the ems probe lost its connection to the hub/data_engine

Resolution

Reboot temporarily resolved the issue but it reoccurred.


Solution:

Tweaked various probes, increased memory where appropriate and made sure java min/max settings were not too far apart, e.g., < 2 GB.


- discovery_server 9.0.2 (set nis_cache_update_interval = 1800), as probeDiscovery queue was also very backed up, ~50k messages


After first ems restart the hub seemed to become unstable again throwing the same error/exception re connection reset.


hub subscribers were not too high (~40)


set postroute_reply_timeout to 300 (from 180)


But after many hours, no crash occurred with the ems.


Also recommended that data indexing be disabled and highly recommended use of SQL Server Enterprise (not Standard) so customer can take advantage of DB partitioning.


Without the indexing enabled, large tables should be reduced. I noticed that there are QOS_IOSTAT_* tables that are over 650 million rows - indexing of large tables can take days and the indexes can become fragmented again in days.


Partitioning is best practice to minimize DB administration and data management issues.