Error in job log with Unix TLS agents: SESSION_ERROR TLS-handshake The socket was closed due to a timeout
search cancel

Error in job log with Unix TLS agents: SESSION_ERROR TLS-handshake The socket was closed due to a timeout

book

Article ID: 261610

calendar_today

Updated On:

Products

CA Automic Workload Automation - Automation Engine CA Automic One Automation

Issue/Introduction

Error in job log with Unix TLS agents: SESSION_ERROR TLS-handshake The socket was closed due to a timeout 

Facing a problem with Automic Automation Kubernetes Edition (21.0.4+build.37) and Unix (Linux) agents 21.0.4+build.13. 

The agents are deployed on premises, the AE JCPs are running in a managed K8s cluster inside AWS, and the two are connected via the TLS connection, using nginx ingress controller, through the corporate firewall. 

The agent connection is stable, and no agent disconnection is happening. However, intermittently, mainly for longer running jobs (1 minute+) the following message is inserted into the job log, and the job fails:

*****************************************************************************  ucxjlx6m     version 21.0.4+build.13          changelist 1661882548  ****  JOB 0001344298 (ProcID:0000013026) START AT   01.03.2023 / 08:42:44  ****                                     UTC TIME   01.03.2023 / 07:42:44  ****  TEXT="        Job started             "                              *****************************************************************************-1 - wrong message type20230301/084314.086 U0009909 TRACE: (wrong type error)              0x1077e80 01268                    00000000  53455353 494F4E5F 4552524F 52000000  >SESSION_ERROR...<                    00000010  00000000 00000000 00000000 00000000  >................<                    00000020  2A414745 4E540000 00000000 00000000  >*AGENT..........<                    00000030= 00000000 00000000 00000000 00000000  >................<                    00000060  01000000 01000000 756E6B6E 6F776E00  >........unknown.<                    00000070= 00000000 00000000 00000000 00000000  >................<                    000000F0  B9851E00 2A414745 4E547C54 4C532D68  >....*AGENT|TLS-h<                    00000100  616E6473 68616B65 2F312854 68652073  >andshake/1(The s<                    00000110  6F636B65 74207761 7320636C 6F736564  >ocket was closed<                    00000120  20647565 20746F20 61207469 6D656F75  > due to a timeou<                    00000130  74290000 00000000 00000000 00000000  >t)..............<                    00000140= 00000000 00000000 00000000 00000000  >................<                    000004F0  00000000                             >....<-1 - timeout-1 - timeout

 The issue is not related to what the job itself contains, it can be triggered with a simple Bash job containing just "sleep 1200" The error message is likely inserted when the job messenger is running.

The job log contains just the following:

20230301/084244.011 - U02000005 Job 'JOBS.WLA.TESTCASE' with RunID '1344298' is to be started.
20230301/084244.040 - U02000003 Job 'JOBS.WLA.TESTCASE' started with RunID '1344298'.
20230301/084344.013 - U02000015 Periodical job test started.
...
20230301/090130.009 - U02000009 Job 'JOBS.WLA.TESTCASE' with RunID '1344298' ended with return code '15'.

 

Environment

Release : 21.0.4

Resolution

Unix job messenger disconnection issue was solved with 21.0.5 HF1