Results: unable to login to uim server and cannot monitor, also Ump is down.
ems exception
Jul 11 12:09:04:659 [attach_socket, ems] Dispatching to / to api/nas/alarms
Jul 11 12:09:14:042 [attach_clientsession, ems] Exception in NimServerSessionThread.run. Closing session.
Jul 11 12:09:14:043 [attach_clientsession, ems] (2) communication error, I/O error on nim session (S) com.nimsoft.nimbus.NimServerSession(Socket[addr=/10.30.230.4,port=48002,localport=52568]): Connection reset
at com.nimsoft.nimbus.NimSessionBase.recv(NimSessionBase.java:944)
at com.nimsoft.nimbus.NimServerSession.recv(NimServerSession.java:90)
at com.nimsoft.nimbus.NimServerSession$NimServerSessionThread.handleMessage(NimServerSession.java:154)
at com.nimsoft.nimbus.NimServerSession$NimServerSessionThread.run(NimServerSession.java:123)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:209)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.net.SocketInputStream.read(SocketInputStream.java:223)
at com.nimsoft.nimbus.NimSessionBase.readNimbusHeader(NimSessionBase.java:1077)
at com.nimsoft.nimbus.NimSessionBase.recv(NimSessionBase.java:883)
... 3 more
Jul 11 12:09:14:043 [attach_clientsession, ems] Exception in NimServerSessionThread.run. Closing session.
Jul 11 12:09:14:044 [attach_clientsession, ems] (2) communication error, I/O error on nim session (S) com.nimsoft.nimbus.NimServerSession(Socket[addr=/10.30.230.4,port=48002,localport=52429]): Connection reset
at com.nimsoft.nimbus.NimSessionBase.recv(NimSessionBase.java:944)
at com.nimsoft.nimbus.NimServerSession.recv(NimServerSession.java:90)
at com.nimsoft.nimbus.NimServerSession$NimServerSessionThread.handleMessage(NimServerSession.java:154)
at com.nimsoft.nimbus.NimServerSession$NimServerSessionThread.run(NimServerSession.java:123)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:209)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.net.SocketInputStream.read(SocketInputStream.java:223)
at com.nimsoft.nimbus.NimSessionBase.readNimbusHeader(NimSessionBase.java:1077)
at com.nimsoft.nimbus.NimSessionBase.recv(NimSessionBase.java:883)
... 3 more
AND...
Caused by: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'dataSource' defined in class path resource [com/nimsoft/events/common/config/DataAccessConfig.class]: Unsatisfied dependency expressed through method 'dataSource' parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'databaseConnectionInfo' defined in class path resource [com/nimsoft/events/common/config/DataAccessConfig.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.nimsoft.nimbus.lookup.model.DatabaseConnectionInfo]: Factory method 'databaseConnectionInfo' threw exception; nested exception is (4) not found, Received status (4) on response (for sendRcv) for cmd = 'nametoip' name = '/hms/Primary_Hub/WPAPPUIM001/data_engine'
Release : 9.0.2
Component : UIM - HUB
- UIM 9.0.2
- spectrumgtw 8.67
- ems 10.2.1 HF1
- udm_manager 9.0.2
- discovery_server 9.0.2
Reboot temporarily resolved the issue but it reoccurred.
Solution:
Tweaked various probes, increased memory where appropriate and made sure java min/max settings were not too far apart, e.g., < 2 GB.
- discovery_server 9.0.2 (set nis_cache_update_interval = 1800), as probeDiscovery queue was also very backed up, ~50k messages
After first ems restart the hub seemed to become unstable again throwing the same error/exception re connection reset.
hub subscribers were not too high (~40)
set postroute_reply_timeout to 300 (from 180)
But after many hours, no crash occurred with the ems.
Also recommended that data indexing be disabled and highly recommended use of SQL Server Enterprise (not Standard) so customer can take advantage of DB partitioning.
Without the indexing enabled, large tables should be reduced. I noticed that there are QOS_IOSTAT_* tables that are over 650 million rows - indexing of large tables can take days and the indexes can become fragmented again in days.
Partitioning is best practice to minimize DB administration and data management issues.