search cancel

Linux Agent ENDED_LOST during high workload (CPU 100%)

book

Article ID: 216216

calendar_today

Updated On:

Products

CA Automic Workload Automation - Automation Engine

Issue/Introduction

Under high CPU load of the server it can happen that the agent disconnects from the AE with the following message:

20210504/192333.263 - U02002036 Could not receive anything from partner '*SERVER'. Error code '104(Connection reset by peer),S(11(ID=4))'.
20210504/192333.264 - U02000010 Connection to Server '*SERVER(s=11,ID=4)' terminated.

In addition, the Agent log often displays the following message during high workload:

20210513/191743.558 - U02002040 Disconnected from '*IPC(LISTENER)' (socket handle = 's=9,ID=149').
20210513/192304.220 - U02003056 Start of agent process 'LISTENER' with PID='28478' has been initiated.

 

Environment

Release : 12.3

Component : AUTOMATION ENGINE

Cause

Due to the high workload the connection between the listener and and the main Agent process gets lost and the Agent doesn't detect that the connection to the AE is lost.

Resolution

Workaround: After increasing the priority of the Agent's processes the issue disappears. 

Additional Information

Increasing the priority should always be done in collaboration with the Unix admin.

This can be done by nice'ing or renice'ing the Agents main process or by adding a line to /etc/security/limits.conf, indicating the priority of all the automic user's processes:

automic soft priority 68

or 

automic soft nice -12