UIM Linux Hub restarts by itself: caused hub to reach max file descriptors
search cancel

UIM Linux Hub restarts by itself: caused hub to reach max file descriptors

book

Article ID: 210163

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

Intermittenly, a RHEL 7.5 hub restarts by itself.  This only particular hub where this issue occurrs is a secondary hub.

The hub with the issue is a secondary hub  It hosts around 40 hubs connecting in as tunnel clients. There are 2 hubs and only this one has an issue.

 

  • The following is logged exactly before the hub restarts:

 

Feb 19 08:28:14:550 [140416010274560] 1 hub: TSESS-A-432-3 [xxx-clt-xxx-01] sent     100 'archive_list' (434 bytes, 0 ms)

 

Feb 19 08:28:14:555 [140418922542848] 0 hub: FATAL: CTRL <vm_host> caused hub to reach max file descriptors

 

Feb 19 08:28:14:555 [140418922542848] 2 hub: CTRL <vm_host> terminating sessions

 

 

 

Feb 19 08:28:16:241 [140418897364736] 0 hub: FATAL: CTRL <vm_host> caused hub to reach max file descriptors

 

Feb 19 08:28:16:241 [140415943132928] 0 hub: RREQUEST: probe_list <-###.##.##.##/41646  h=414 d=0 fd=821

 

Feb 19 08:28:16:241 [140418897364736] 2 hub: CTRL <vm_host> terminating sessions

 

 

  • The log is from the secondary hub (xxxx) 9.32HF2.
  • The logs error (] 0 hub: FATAL: CTRL <vm_host> caused hub to reach max file descriptors) mention hubs that are attached to this secondary hub Example: “<vm_host>”.

 

  • <hostname> there was NO Restart and NO error at all when this issue occurred.

 

  • The customer sees this issue indistinctively with hub versions 9.31/9.32HF2

Environment

ENV:   UIM 20.3.2*

Secondary hub affected: Hub 9.32HF2 / 9.31 - RHEL 7.x*

Primary hub 9.32HF2 - Windows 2012 R

 

Resolution

As this hub is acting as tunnel server, more file descriptors than what you'd expect are opening at the same time - hence it is reaching the default value of 1024 and so the hub restarts.

There is a configurable parameter "tunnel_max_fd" in hub.cfg under hub section.

Please set the parameter tunnel_max_fd to increased values or even make it unlimited by setting tunnel_max_fd = 0 if needed. 

Also make sure file descriptors are increased on the system.

Additional Information

This issue is applicable to any robot version after robot 7.96. Before 7.96 there was a known issue:

https://knowledge.broadcom.com/external/article?articleId=122729

However, this is resolved. as there was an hardcoded limit. So if the number of file descriptor (opened by hub) reaches 999, hub then undergo self-restart automatically.