Baseline _engine fails after several hours on linux

book

Article ID: 131651

calendar_today

Updated On:

Products

DX Infrastructure Management NIMSOFT PROBES

Issue/Introduction

baseline_engine probe suddenly get stopped with error:
 [  Thread-5]  WARN  queuemessengerimpl.QueueReaderImpl - The subscription to baseline_engine.QOS_MESSAGE is not ok. Resetting the subscription
[  Thread-5]  INFO  queuemessengerimpl.QueueReaderImpl - queue list not recieved from hub
[  Thread-5]  INFO  queuemessengerimpl.QueueReaderImpl - Queue baseline_engine.QOS_MESSAGE does not exist.
[  Thread-5]  ERROR impl.NimsoftApiImpl - NimSubscribe failure :(80) Session error, Unable to open a client session for [IP_ADDRESS]:48002: Connection refused
[  Thread-5]  ERROR impl.NimsoftApiImpl - .........subscribe for queue failed ..........

Also at _hub.log I can see:
[140666998101760] hub: ssl_server_wait - BIO error accepting connection 
[140666998101760] hub: [1] error:0x02008018:system library:accept:Too many open files 

Cause

file-descriptor-count setting at Operating System is not enough for the required load. 

Environment

Linux RedHat 
robot_update,"7.97HF3","656","Updates Nimsoft Robots to current version"
Hub 7.97 [Build 7.97.296 - hub: licIpCheck - 10.200.1.22 
baseline_engine,"9.02","9.0.2-33","Baseline Engine"
prediction_engine,"9.02","9.0.2-20","Prediction Engine"

Resolution

It is required that Opertating System Administrator the amount of open files allow by OS to the users (file-descriptor-count). 

To see the settings for a user, as root issue the following commands: 

su - <user that Run Nimsoft Hub daemons> 
ulimit -n 

The default setting for this is usually 1024. If more is needed for a specific user than as root modify it in the /etc/security/limits.conf file: 

@user -nofile 2048 

Will set the maximum open files for the specific "user" to 2048 files. 

To do a system wide increase for all users as root edit the /etc/security/limits.conf file and add the following: 

* - nofile 2048 

This sets the maximum open files for ALL users to 2048 files. These settings will require a reboot to become active. 

After performing the change verify that "ulimit -n" for the corresponding used was increased .

Note 1: Keep in mind that this way of configure Ulimit may variate depending the Unix/Linux version. Please adjust this setting with the Operating System Administrator. 

Note 2: The command "lsof | wc -l " could help you determined the amount of file-descriptor-count required.