Agent does not start, disconnects then reconnects in a loop.
search cancel

Agent does not start, disconnects then reconnects in a loop.

book

Article ID: 109761

calendar_today

Updated On:

Products

CA Automic Workload Automation - Automation Engine CA Automic One Automation

Issue/Introduction

The system has one or more agent that does not come up. The agent starts, logs on to the system and then the connection is terminated. The sequence below comes in a loop every minute:

20180726/145036.222 - U02000072 Connection to system 'UC4' initiated.
20180726/145036.229 - U02000011 Connection to Server 'fqdn:2218' initiated.
20180726/145036.230 - U02000348 Connection to 'fqdn:2218(ID=9)' initiated.
20180726/145036.234 - U02000004 Connection to Server 'UC4#CP002(ID=9)' successfully created.
20180726/145036.234 - U02000354 CP Server 'UC4#CP002' reports ranking '3000'.
20180726/145036.243 - U02000011 Connection to Server 'fqdn:2221' initiated.
20180726/145036.243 - U02000348 Connection to 'fqdn:2221(ID=10)' initiated.
20180726/145036.247 - U02000004 Connection to Server 'UC4#CP005(ID=10)' successfully created.
20180726/145036.247 - U02000354 CP Server 'UC4#CP005' reports ranking '1'.
20180726/145036.254 - U02000073 Connection to system 'UC4' via CP Server 'UC4#CP005' successfully established.
20180726/145036.255 - U02000010 Connection to Server 'UC4#CP002(s=12,ID=9)' terminated.
20180726/145036.264 - U02000066 Host information: Host name='hostfqdn', IP address='10.10.10.10'
20180726/145045.998 - U02000010 Connection to Server '*SERVER(s=13,ID=10)' terminated.
It always ends with terminated and then it starts from the beginning. The Agent could only be started when it was renamed. In the AE-logs files this can be found:

20180726/143624.872 - U00015012 'UNIX_AGENT' is still in cache. Agent will be disconnected after waiting for PWP.
20180726/143624.872 - U00011650 Server 'UC4#WP008' / Client '0000': Host 'UNIX_AGENT' ended abnormally. (Index='0000000037' CP='MQ1CP003')

Sometimes this can overload the PWP and cause an outage.

Cause

Agent name remains in cache and Agent cannot be restarted.

Resolution

Fixed in:

Automation.Engine 12.1.4
Automation.Engine 12.3.0
Automation.Engine 12.2.2

Workaround:

Restart all WPs, one after the other. I.e. stop one WP and start it again. Then do this with the next until all WPs were restarted. Restart the PWP last.

If this is not possible, the agent can be renamed. Then it starts again.

Some customers had success with just switch the PWP to a different WP.