Agents v21 disconnected with U02000313 on Agent and U00003438 on JCP log
search cancel

Agents v21 disconnected with U02000313 on Agent and U00003438 on JCP log

book

Article ID: 368801

calendar_today

Updated On:

Products

CA Automic Workload Automation - Automation Engine

Issue/Introduction

After upgrading Agents from 12.3.9 to 21.x, the 21.x JCP kicks out regularly the Agents due to the WEBSOCKET_TIMEOUT with a default of 60s with the message:

U00003438 Error in connection 'Agent AGENT_NAME'. 'java.util.concurrent.TimeoutException' function: Error 'Idle timeout expired: 60001/60000 ms'

We managed to capture this phenomenon tracing the Agents with tcpip=5 and it seems that the issue should be caused by JCP/Agent not sending the ping-pong during a long period of time (more than the usual 30s).
In the JCP log:

20240316/023346.589 - 96     U00003438 Error in connection 'Agent AGENT_NAME'. 'java.util.concurrent.TimeoutException' function: Error 'Idle timeout expired: 60001/60000 ms'
20240316/023346.589 - 96     U00003407 Client connection 'CP00X#00000171' from 'AGENT_IP' has logged off from the Server.
20240316/023346.601 - 96     U00003397 Agent 'AGENT_NAME' logged off (client connection='CP00X#00000171').
20240316/023355.673 - 3680   U00003406 Client connection 'CP00X#00000179'  from '10.xxxxx' has logged on to the Server.
20240316/023355.678 - 4083   U00003412 Agent 'AGENT_NAME' logged on (Client connection='CP00X#00000179').
20240316/023355.725 - 68     U00029416 Agent Challenge authorization successful : 'AGENT_NAME' 
20240316/025727.413 - 3243   U00003406 Client connection 'CP00X#00000180'  from '10.xxxx' has logged on to the Server.

In the Agent trace we observed that JCP sends via the websocket a ping and agent replies with pong around every 30s, it looks like the JCP would not have sent the ping at 20240316/023316 neither at 20240316/023346 so the 60s WEBSOCKET_TIMEOUT applied and JCP disconnected the Agent thinking it was down:

MAIN_THREAD      20240316/023016.581 boost::string_view)>(119): WSS-control-callback(*SERVER): pong
MAIN_THREAD      20240316/023046.582 boost::string_view)>(119): WSS-control-callback(*SERVER): pong
MAIN_THREAD      20240316/023116.583 boost::string_view)>(119): WSS-control-callback(*SERVER): pong
MAIN_THREAD      20240316/023146.585 boost::string_view)>(119): WSS-control-callback(*SERVER): pong
MAIN_THREAD      20240316/023216.595 boost::string_view)>(119): WSS-control-callback(*SERVER): pong
MAIN_THREAD      20240316/023246.588 boost::string_view)>(119): WSS-control-callback(*SERVER): pong
????????????????
MAIN_THREAD      20240316/023355.625 ccm::Interface::interface_error(name=*SERVER,ID=1,active=true,host=AE_HOSTNAME,error=exception(nr=2000313,msginsert=*SERVER|WS-read/104(Connection reset by peer),errno=104) -->

Environment

Automation Engine 21.x or 24.x with Agents TLS (v21 or v24)

Cause

Insufficient timeout for TLS Agents set by default as WEBSOCKET_TIMEOUT in UC_SYSTEM_SETTINGS to 60s (opposed to the default on agents non-TLS like 12.x that was set to 600s).

Resolution

Update to a fix version listed below or a newer version if available.

Fix version:
Component(s): Automation Engine
 Automation.Engine 21.0.11 - Available
Automation.Engine 24.1.0 - Available

Additional Information

Changes performed via the defect AE-36032 following AE-36663.

In UC_HOSTCHAR_DEFAULT the setting WEBSOCKET_TIMEOUT allows to specify individual timeout setting for the connection between Agent and Automation Engine for each Agent independently.

By default, the WEBSOCKET_TIMEOUT will now be 600s and will apply to all Agents TLS as soon as the AE core is upgraded to 21.0.11 or 24.1.0 or superior instead of using the one in UC_SYSTEM_SETTINGS.