Some jobs fail to be submitted in a node where SSL was enabled.
In the universe.log we can find the following kind of errors at the time of the issue:
| 2022-01-19 01:58:35 |ERROR|X|IO |pid=10508.16192| u_io_thread_trt | New client 2740 (/ on ) authentication failed: Comlayer error (Time out on Hello)
| 2022-01-19 01:58:37 |ERROR|X|uni|pid=11828.2816| k_handshakeHello | u_recv_msg in error [-2]: ()
| 2022-01-19 01:58:37 |ERROR|X|uni|pid=11828.2816| o_connect_auth | k_connect_auth_timeout returns error [-1]
| 2022-01-19 01:58:37 |ERROR|X|uni|pid=11828.2816| u_io_callsrv_gen | o_connect_auth in error [-1]
| 2022-01-19 01:58:41 |ERROR|X|IO |pid=10508.17892| u_io_thread_trt | New client 2612 (/ on ) authentication failed: Comlayer error (Time out on Hello)
| 2022-01-19 01:58:41 |ERROR|X|IO |pid=10508.15504| u_io_thread_trt | New client 1780 (/ on ) authentication failed: Comlayer error (Time out on Hello)
| 2022-01-19 01:58:41 |ERROR|X|IO |pid=10508.18372| u_io_thread_trt | New client 2108 (/ on ) authentication failed: Comlayer error (Time out on Hello)
| 2022-01-19 01:58:42 |ERROR|X|IO |pid=10508.17600| u_io_thread_trt | New client 2528 (/ on ) authentication failed: Comlayer error (Time out on Hello)
| 2022-01-19 01:58:42 |ERROR|X|INI|pid=13332.14456| k_handshakeHello | u_recv_msg in error [-2]: ()
| 2022-01-19 01:58:42 |ERROR|X|INI|pid=13332.14456| o_connect_auth | k_connect_auth_timeout returns error [-1]
| 2022-01-19 01:58:42 |ERROR|X|INI|pid=13332.14456| u_io_callsrv_connect_r | Error connecting to target IO server: Hello request error (handshake hello to [SWSLBP_local_SIO_X]/[] timeout 0 in error Comlayer error (Error in shutting down ssl socket: SSL
But we can see as well this during other times for other binaries different that uxjobinit (INI), like uxjobstatus (STA) or uxjobend (END), with the following kind of error:
| 2021-11-11 02:49:18 |ERROR|X|INI|pid=15844.8876| u_io_callsrv_connect_r | Error connecting to target IO server: Hello request error (handshake hello to [SWSLBP_local_SIO_X]/[] timeout 0 in error Connection closed (recv(ssl socket=424, bytes=5) return | 2021-11-19 22:31:42 |ERROR|X|STA|pid=16912.2476| u_io_callsrv_connect_r | Error connecting to target IO server: Hello request error (handshake hello to [SWSLBP_local_SIO_X]/[] timeout 0 in error Connection closed (recv(ssl socket=416, bytes=5) return | 2021-12-10 22:32:26 |ERROR|X|END|pid=13920.1888| u_io_callsrv_connect_r | Error connecting to target IO server: Hello request error (handshake hello to [SWSLBP_local_SIO_X]/[] timeout 0 in error Connection closed (recv(ssl socket=412, bytes=5) return
Release : 6.x
Component : DOLLAR UNIVERSE
SSL Enabled in the Nodes
Configuration issue, default timeout for SSL connections (20s) being not sufficient in case of important activity.
We suggest to increase both the IO timeout and the Time-out for SSL connection from the default values (10s and 20s) to 60s.
In the Node Settings - Section Technical Settings
Time-out for IO server (seconds):10 - increase to 60
In the Node Settings - Section Network parameters - TLS/SSL Settings
Time-out for SSL connection (seconds): 20 - increase to 60
Save and Close and restart the Node, timeout related errors should not appear anymore.