The hub tunnels/tunnel system seems to be unstable.

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

The hub tunnels/tunnel system seems to be unstable. Connections between the primary and secondary hubs stops working sporadically, for some hub to hub communication data queues up on the secondary hubs. Error messages in the hub.log may include:

hub: CTRL HSH select() ERROR 10022
hub: ssl_server_wait - SSL_accept timeout on new SSL connection
hub: ssl_server_wait - SSL_accept error (5) on new SSL connection
hub: TSESS-135020 name to IP failed for /Domain/Hub/Robot/data_engine (not found)

or

hub: SSL_shutdown on TSESS-9785: SSL connection want read

Environment

UIM 20.3 and later

Cause

Bad certificates

Resolution

Bad certificates
- If the problem is consistent to a single hub to hub connect, try recreating the certificate.

Failed LDAP connection
- If the LDAP connection to AD is being used please check that there are no issues logging in.

Network problems
- Check that there are no alerts for the network probes between the primary and secondary hubs

Anti-virus/Intrusion prevention systems blocking
- Check that the logs for these types of products on both ends of the tunnel

Hub/robot versions and known issues (see release notes or each probe)
Need for version updates to hubs/robots, e.g., to 5.82 or > and 5.70 respectively.

You may have to set the bulk size for an overtaxed tunnel hub, e.g., single point taking connections from multiple hub clients, to a higher number temporarily or permanently, e.g., select hub probe, hold down the SHIFT key and rt-click to open Raw Configure...then choose postroute and select the problematic queue for instance that is not sending messages/alarms and either experiment with higher numbers or set it to 1000 and check the hub tunnel Status for that queue to see if the queue is draining more quickly/efficiently.

Additional Information

Hub performance deteriorating in Windows Platform Also seeing Illegal SID messages in logs:

On Windows, this may be due to too many subscribers/concurrent connections to a given process. To determine if this is the case:

1. Select hub probe
2. Click CTRL+p
3. In Probe Configuration GUI, select list_subscribers from command-set drop down list
4. Click green button
5. Go to the bottom, right hand side of the output window and check the last subscriber number if subscribers exceed 64 (in the image above, there are only 23 subscribers), then they are processed in round robin fashion.

This severely affects overall hub performance in processing queues, maintaining tunnels etc. If this is the case, then try to distribute management clients to another hub so that Infrastructure Manager, etc. are not logging in to primary hub (or the hub exceeding 64 subscribers).

Setup a Linux hub and direct all client logins to that hub if too many get/attach queues are setup (on large size customer) then either move queues to other hub or use 'post' queues instead of get/attach queues Limiting the number of secondary hubs to under 25.

Implement additional "proxy" hubs, which take in multiple connections from a number of other hubs then feed the data up to the primary.

On Windows, this may be due to too many subscribers/concurrent connections to a given process. To determine if this is the case:

1. Select hub probe
2. Click CTRL+p
3. In Probe Configuration GUI, select list_subscribers from command-set drop down list
4. Click green button
5. Go to the bottom, right hand side of the output window and check the last subscriber number if subscribers exceed 64 (in the image above, there are only 23 subscribers), then they are processed in round robin fashion.

This severely affects overall hub performance in processing queues, maintaining tunnels etc. If this is the case, then try to distribute management clients to another hub so that Infrastructure Manager, etc. are not logging in to primary hub (or the hub exceeding 64 subscribers).

Setup a Linux hub and direct all client logins to that hub if too many get/attach queues are setup (on large size customer) then either move queues to other hub or use 'post' queues instead of get/attach queues Limiting the number of secondary hubs to under 25.

Implement additional "proxy" hubs, which take in multiple connections from a number of other hubs then feed the data up to the primary.