This is a summary of UIM (Nimsoft) hub timeout, retry, and other settings described and explained in detail for the sake of adjusting and tuning hub behavior, performance, and reliability as well as some advice on under what circumstances you may want to adjust them
DX UIM customers need to clearly understand the timeout settings, defaults, general recommendations, and details regarding hub timeout, retry, and other parameters / settings so they can make appropriate modifications via raw configuration, all based on hub/hub-to-hub behavior
DX UIM hub raw configure settings should be adjusted for optimal performance
Environment
UIM v8.5.1 or higher
hub v7.93 or higher
robot v7.93 or higher
Resolution
hub timeouts hub timeouts collectively control the hub's behavior when it encounters slow responses from queues and other hubs. Specifically the postroute_* timeouts have to do with queues.
postroute_interval Controls how frequently the hub checks queue subscriptions. The default = 30.
postroute_reply_timeout This value is also in seconds, and determines how long the hub will wait for a reply from any queue/subscriber after sending messages. The default = 180. It controls how long the hub waits (in seconds) for a reply from the remote hub after sending a bulk of messages on a queue before deciding that it didn't go through and then re-sends the bulk.
postroute_passive_timeout This value is also in seconds, and decides how long the hub will let the queue be 'passive,' before disconnecting it. The default = 60. This setting controls how long the hub will allow a queue to have no data/traffic flowing across it before it decides that the queue needs to be reset. Note: this should always be set higher than postroute_interval to avoid false resets!
hub_request_timeout The timeout value for hub (hub-to-hub) communication requests. The default = 60. This setting controls how long the hub waits for other (non-bulk-post) request, e.g. nametoip requests, from another hub before deciding it has timed out
tunnel_hang_timeout This controls how long the hub will allow a tunnel to have no data flowing across it, before it decides that the tunnel needs to be closed and reconnected. The default is 300. The hub continuously checks if one or more of the active tunnels are hanging. No new connections can be established through tunnels that are hanging. If a tunnel is hanging, the hub attempts to restart the tunnel. If the restart fails, the hub performs a restart after the specified number of seconds. On systems with very low latency and fast response between hubs, it may be beneficial to decrease tunnel_hang_timeout to 120, or even 60.
tunnel_hang_retries tunnel_hang_retries setting is not present in the hub.cfg by default, and when not present, it defaults to 1. This controls how many times a tunnel connection will be reattempted when unresponsive, before the hub simply restarts itself entirely to attempt to self-heal. Setting this higher will allow the hub to internally retry the connection a couple more times before restarting. It controls how many times the hub will try reconnecting a tunnel -- if it fails this number of times, the hub will perform an internal restart to try and get the tunnels operational again
reply_timeout Specifies the reply timeout setting on a per queue basis. This value overrides the global postroute_reply_timeout for the specific queue. You can specify this setting in the <postroute>/<name_of_queue> section of the hub.cfg.
max_heartbeat Indicates how long the tunnel server will wait for client heartbeats. The default is 30. hub -> <tunnel> section Tunnel control heartbeat allows the hub to self-correct OpenSSL session loss or degradation. (Network latency and load-dependent)
The following setting is applied to the Robot <controller> section.
reuse_async_session Resolves an issue where the probe_config_get callback fails every other time. To implement the fix, if not present, add the key reuse_async_session = 1 to the controller section of the robot.cfg. The default is 0, which is off.
reuse_async_session causes the controller to reuse TCP sessions instead of making new ones for every request.
In the hub log you may see a message that contains the message string: "read requested but connection being closed." With reuse_async_session = 1 set in the robot.cfg, and protocol_mode = 3, "close" requests from the server end are essentially ignored in favor of keeping the session OPEN until the client indicates that it should be closed.
check_spooler_sessions Mechanism to periodically close spooler sessions that are in a 'half-closed' state. The check_spooler_sessions variable is set to 0 = off by default. When set to 1 which is'on,' the spooler sessions are closed. Essentially, this enables a periodic check (30 seconds) for idle/lost spooler sessions; that is, sessions that have had no traffic for passive_session_timeout (300) seconds. Among other benefits, it is a starvation problem on Windows hubs when a request would be made for a subscription and when it wasn’t responded to in 10 seconds, the requester would send a close and close their end. However, because of starvation, not only would the subscription request not get processed, but also the close request – so the connection was left in a half-closed (CLOSE_WAIT) state. Note that as of hub v7.95 this is the default behavior.
protocol_mode Set via Raw Configure in the hub-><tunnel> section. The hub's tunnel algorithm has several modes of operation which can be used to influence the behavior of tunnel session handling. Different environments may require different protocol settings in order to provide stability of hub-to-hub communications, specifically with regard to managing configurations in the Infrastructure Manager client.
The protocol_mode should be adjusted as a last resort, after all other timeout/configuration settings have been exhausted, when the following symptoms appear in Infrastructure Manager:
- Sometimes a hub or robot (and its probes) can be configured via IM, but when I try again a few minutes later, I get a communication error even though the robot still looks OK/green.
- Sometimes hubs appear to turn red in IM, but when I click on them, they are available immediately and turn green.
- Sometimes I am able to open a probe configuration GUI successfully but then when I try to save the changes, I get a communication error.
- Other "strange and intermittent" issues with communication failures in IM.
IMPORTANT: All tunneled hubs in the environment must share the same protocol mode. If you change it on one hub, you must match the setting on all the other hubs.
protocol_mode valid configuration value settings are either 0, 1, 2, or 3. These settings only affect hubs with tunnel connections (clients or servers) as they are placed in the <tunnel> section of the hub.cfg.
protocol_mode = 0 Description: This is the default prior to hub v7.96. It is most successful in a "two-tier" environment - meaning that there is only one level of tunnel server/client on the network.
Best Practices for protocol_mode 0: - If the Primary hub is tunneled to secondary hubs directly - If the primary hub has no tunnels, but there is a local tunnel server, and secondary hubs are clients to it - there are no third-tier hubs (e.g. clients of clients)
protocol_mode = 1 Description: Protocol_mode 1 is an algorithm which handles "session close" messages a little differently and is intended for multi-tier environments (3 or more total tiers connected by tunnels). Many times, protocol_mode 0 is sufficient for such environments, but if there appears to be instability, it is worth trying protocol_mode 1 first.
Best Practices for protocol_mode 1: - If the primary hub is a tunnel server, and has a "tunnel concentrator" hub as a client, which in turn is also doing double-duty as a tunnel server and has its own tunnel clients attached; - If the primary hub is not tunneled, but there is a tunnel server hub locally, and secondary hubs are clients to it, and those secondary hubs also serve as tunnel servers to a third level of clients.
protocol_mode = 2 Description: This mode is rarely used; it is a slightly modified version of protocol_mode 0, intended for 2-tier or lower environments, where instability is still observed.
Best Practices for protocol_mode 2: - If protocol_mode 0 (the default) would be appropriate but there is still instability, you can try protocol_mode 2. If it makes no difference, then it's advisable to try the other modes, and/or revert back to 0.
protocol_mode = 3 (This is the default value for hub v7.96) Description: this is a variant of protocol_mode 1 with a change in the behavior for handling closes from the “bottom” end. By default, when IM sends a probe_config_get (and now probe_config_set also) request to a remote controller, the controller sends the reply and then sends a close to close the connection from the bottom end. protocol_mode = 3 is intended more so for 3-tier tunnel networks.
Best Practices for protocol_mode 3: this should be appropriate for most environments. Try one of the other modes, depending on your environment, only if the default does not provide the desired stability.
Suggested settings from the Help doc for hubs version 7.x or higher for hub timeouts/retries/interval: