Best practices for troubleshooting Agent issues.
search cancel

Best practices for troubleshooting Agent issues.

book

Article ID: 88374

calendar_today

Updated On:

Products

CA Automic Workload Automation - Automation Engine

Issue/Introduction

This article contains information about how to troubleshoot an Agent that has one of the following issues:

  • Not able to start Agent
  • Agent disconnected
  • Jobs are stuck in status 'Waiting for host'
  • Host is not active in System Overview
  • Agent is not processing jobs

Environment

This article is valid for versions 12.3 and earlier

Resolution

Investigation

Information needed for analysis:

  • Agent, Server and Initial Data versions.
  • Is this a new installation of an Agent?
  • When did the errors first occur?
  • Is more than one Agent affected?
  • Have there been any changes made to Automic or the environment that Automic runs in?

Files needed for analysis:

  • Agent logfile from the time the error occurred (default location: temp folder of the Agent)
  • Agent logfile that contains the startup section or ini file of the Agent (default location: bin folder of the Agent)
  • Communication Processes (CP) logfiles from the time the error occurred (default location temp folder of the Automation Engine)

Basic Steps:

  • Compare the Agent, Server and Initialdata version information.

Server and Initialdata must be from the same Servicepack, the Agent should be from the same or previous Servicepack. 

(Only valid for V9)
! - Note that the Agents of Servicepack 1-4 can only run with an Automation Engine and Initial Data of the same or a later Servicepack level.

  • Follow the steps below to ensure the basic settings for the Agent are correct:

 

Check the TCP/IP settings in the agent logfile.and compare them with the settings in the communication process logfile.

In the [TCP/IP] section you can find out which communication process (CP) was set for the agent to connect to:

[TCP/IP]

20120812/131001.497 - port = 2323

20120812/131001.497 - bindaddr =

20120812/131001.497 - bindlocal = 0

20120812/131001.497 -; try all n seconds to connect to server

20120812/131001.497 - connect = 60

20120812/131001.497 - report = 60

20120812/131001.497 - SendBufferSize = 1048576

20120812/131001.497 - RecvBufferSize = 1048576

20120812/131001.497 - cp = SERVER01:2217

In this example you can see that the ip address of the communication process the agent connects to was set to: SERVER01:2217

Now look at the communication process logfile:

In the startup section of the communication process logfile you can find the following messages:

U0003492 Server has been started on Host 'SERVER01' ('<IP address>') with process ID '6564'.

U0003486 ListenSocket with port number '2217' successfully created.

Also have a look at the port settings of the communication process logfiles in the startup section:

[PORTS]
20120812/130702.635 - cp1 = 2217
20120812/130702.635 - cp2 = 2218
20120812/130702.635 - cp3 = 2219
20120812/130702.635 - cp4 = 2220

Now we have the information that the communication processes are started on host "SERVER01" with the following ports assigned to them:

cp1 = SERVER01:2217
cp2 = SERVER01:2218
cp3 = SERVER01:2219
cp4 = SERVER01:2220

The settings information that we have read out of the agent logfile in the [TCP/IP] section is correct the agent is supposed to connect to cp = SERVER01:2217.

If the agent and server settings differ from each other no connection can be made and the agent is not connected to the system.

If the above settings are correct, continue the analysis:

In the agent logfile look at the settings [CP_LIST] in the startup section of the agent:

[CP_LIST]

2218=SERVER02

This list is created when the Agent starts and is extended when new communication processes are activated, in some cases we have seen that this list contained information from other systems, this can happen when the Agent was connected to a different system, or if there have been changes in the Host name or TCP/IP scope.

This is the list the Agent will try to get a connection to if the connection to the Communication Process defined in the TCPIP section can't be made.

If there is invalid information in this section, it could also lead to connection issues.

Please erase all false information in the [CP_LIST] section and restart the agent. The list will be updated with the correct information.

If all of the settings above are correct, continue the analysis.

Look at the Agent and CP logfile, around the time the error occurred.

Here is a list of error messages and Knowledge base articles that are linked to these errors.  For additional information on the individual errors, refer to the appropriate Known Error article referenced below.

U2000019 The Server is reporting as system 'Automic'. Disconnecting again.
Refer to Known Error Article 000009832 - Agent not starting receive U2001017 error
 
U0034036 There is no valid license for Agent 'WIN01' (license class 'V', license category 'TestSystem', platform 'EX.OS.WIN').
Refer to Known Error Article 000009834 - Receive U0034036 error when attempting to start Agent

U2000099 Transfer key could not be loaded
Refer to Known Error Article 000009835 - Agent fails to start, receive U2000099 error
 
U2001017 It was not possible to create list socket because port number '2303' is already in use.
Refer Known Error Article 000009832 - Agent not starting receive U2001017 error
 
U0003413 Socket call 'recv' returned error code '10054'.
An existing connection was forcibly closed by the remote host.

Refer to Known Error Article 000009579 - Agent down: Error U0003413 - Socket call 'recv' returns error code '10054'.