We have faced issues regarding LINUX agents multiple times now.
Agent memory usage ( also CPU utilization) keeps on growing as days go by and crash when the server memory is fully utilized. The memory/utilization doesn't reduce even after jobs complete.
Once the agent crashes or is restarted the memory is released.
How could we troubleshoot this issue and find the root cause?
Release : 12.3
Component : AUTOMATION ENGINE
Subcomponent: Agent Linux
Os: Linux
Unknown at this point
Should the issue occur again we would need the following for troubleshooting:
As soon as you notice a high memory usage of the Linux agent processes (ucxjlx6), please do the following
1. Capture the output of the command
ps aux --sort -rss | head -15
2. if you find a ucxjlx6 process using an abnormal amount of memory launch a strace on it as the user root or the user starting that process
strace -tt -fp PID_OF_PROCESS_USING_MEMORY -o /tmp/strace.txt
let it run for a minute or so and then look into file /tmp/strace.txt to see if you can point to a running file transfer or event of that agent that would be transferring a big log/file which would tell you what is the "guilty" object.
3. Look into the out folder of the agent and the agent log and see if you can figure out what is the job causing this issue as the reason is likely to be a job that started running when the memory usage increased a lot
4. System admins should check the memory allocation of the Server: if the Swap is being used, it is a bad sign and usually indicate that the RAM should be increased
5. The Jobs and Events should also be checked as they have been defined wrongly without proper check of the existence of files/folders before attempting to perform copy or move commands. Additionally, it is recommended that Events are only launched when necessary and not running for days and days waiting for the possible arrival of a file.