A tunnel server with multiple clients was restarted, and now most to all clients have not reconnected after the restart.
The tunnel server hub.log shows the following sequence repeatedly for each client:
Oct 22 13:39:36:311 [139939410867776] 3 hub: SSL handshake start from (CLIENT_IP_ADDRESS) fd=19: before SSL initialization
Oct 22 13:39:36:311 [139939410867776] 3 hub: SSL state (accept): before SSL initialization
Oct 22 13:39:36:311 [139939410867776] 3 hub: SSL alert (write): fatal: decode error
Oct 22 13:39:36:311 [139939410867776] 3 hub: (ssl_server_wait_handshake) - accepted a new connection fd=19 host=(CLIENT_IP_ADDRESS)
Oct 22 13:39:36:311 [139939410867776] 1 hub: (ssl_server_wait_handshake) - SSL_accept error from host=(CLIENT_IP_ADDRESS) fd=19, err=-1, ssl_err=1(error:00000001:lib(0)::reason(1)), errno=0(Success)
DX UIM - Any Version
DX UIM Hub - 23.4.5 or prior
Under certain circumstances, including heavy system load or high network latency, the hub can struggle to respond to a series of handshakes quickly, causing a timeout.
The DX UIM hub tunnel client has a hardcoded handshake timeout of 10 seconds.
With a large number of clients trying to connect, it can take longer than 10 seconds to answer all the handshake attempts.
When some client handshakes fail they will repeatedly try to reconnect, exacerbating the condition and eventually causing all handshakes to fail.
Additionally, there is a defect in hub 23.4.5 and prior related to the time clients wait to reconnect.
In DX UIM 23.4.6 (CU6)/hub 23.4.6 the defect for client reconnections will be resolved.
Upgrade to hub 23.4.6 and then adjust the following config key in hub.cfg:
<tunnel>
reconnect_time = 30
In hub 23.4.5 and prior this key does not work, and clients reconnect immediately after each failed attempt, causing the clients to hammer the server with handshake requests.
In hub 23.4.6 and later this key will be respected, and clients will back off when a handshake fails, allowing the hub time to recover.
For an environment with 10 or more tunnel clients, we recommend staggering the value.
For example, 5 hubs could have reconnect_time set to 30, and 5 could have it set to 45. This way, the clients will not all try to reconnect at the same time.
In DX UIM 23.4.7 (CU7) an option will be available to configure the client handshake timeout, which is currently hard coded at 10 seconds. At that time, this article will be updated with the details.