Unable to autoping new agent

Products

Autosys Workload Automation

Issue/Introduction

Scenario 1:

Unable to autoping a new agent. The autoping receives the following error...

CAUAJM_I_50023 AutoPinging Machine [agent_host]
CAUAJM_E_10229 Communication attempt with the CA WAAE Agent has failed! [agent_host:7520]
CAUAJM_E_50281 AutoPing from the Scheduler WAS NOT SUCCESSFUL.

...
CAUAJM_E_10229 Communication attempt with the CA WAAE Agent has failed! [agent_host:7520]
CAUAJM_E_50283 AutoPing from the Application Server WAS NOT SUCCESSFUL.
....

CAUAJM_E_50026 ERROR: AutoPing WAS NOT SUCCESSFUL.

In the transmitter.log on the agent machine, the following error is logged when the autoping is attempted...

10/19/2021 16:05:44.657 EDT-0400 5 TCP/IP Controller Plugin.Transmitter pool thread <Slow:1>.CybTargetHandlerChannelLogHelper.logConnectionAttempt[:75] - Attempting to open conversation to CENTRAL_MANAGER@scheduler_host:7507 using plain socket

10/19/2021 16:05:44.657 EDT-0400 5 TCP/IP Controller Plugin.Transmitter pool thread <Slow:1>.CybTargetHandlerChannelLogHelper.logConnectionInfo[:109] - Opened conversation to CENTRAL_MANAGER@scheduler_host:7507 with partner at ##.#.##.###:7507 with timeout of 10000 from ##.#.##.###:57075

10/19/2021 16:05:44.657 EDT-0400 1 TCP/IP Controller Plugin.Transmitter pool thread <Slow:1>.CybTargetHandlerChannel.sendMessage[:627] - Error sending message to CENTRAL_MANAGER:
cybermation.library.communications.CybConversationConnectionResetException: Reset by peer

Scenario 2:
When running an autoping command against a new agent install, it is consistently failing, but for inconsistent reasons.

Sometimes, the autoping will succeed for the Scheduler/Application Server, but not the other. Other times it will fail from both.

Environment

Release : 11.3/11.4/11.5/12.0

Component : CA Workload Automation System Agent

Cause

Scenario 1 Cause:

While there can be a wide variety of root causes for an autoping failure, the key error to the one we are focusing on with this document is in the transmitter.log.
The agent is attempting to open a connection back to the Scheduler and using a manager id called "CENTRAL_MANAGER"...

10/19/2021 16:05:44.657 EDT-0400 5 TCP/IP Controller Plugin.Transmitter pool thread <Slow:1>.CybTargetHandlerChannelLogHelper.logConnectionAttempt[:75] - Attempting to open conversation to CENTRAL_MANAGER@scheduler_host:7507 using plain socket

During the agent installation, it asks whether you want to add a manager to the agent configuration. If you take the default, parameters get added to the agent configuration for a single manager and the manager id is set to CENTRAL_MANAGER. The parameters in the agentparm.txt file will look like this...

communication.managerid_1=CENTRAL_MANAGER
communication.manageraddress_1=scheduler_host
communication.managerport_1=7507
communication.monitorobject_1=AGENT/AGENTMON1.0/MAIN

This manager id setting is not compatible with AutoSys.
For an AutoSys instance, the manager id must be <INS>_SCH, where <INS> is the three-character name of the AutoSys instance.

Scenario 2 Cause:

There are a variety of reasons that can cause this behavior.
Intermittent success/fail for autoping almost always is due to a problem with the network.

Scenario 3 Cause:

Another possible cause for an autoping failure is "Wrong target" exception in agent's receiver.log.

02/23/2023 12:10:01.053-0500 1 TCP/IP Controller Plugin.Receiver pool thread <Slow:1>.CybReceiverChannel.a[:214] - Can't parse the message: cybermation.library.communications.CybConversationWrongMessageException: Wrong target:

-------------------

The 'Wrong target' exception indicates mismatch agent name in agentparm.txt file and agent definition in the Scheduler.

Resolution

Scenario 1 Solution:
To correct this problem, make the following modifications to the agentparm.txt file...

Remove the default communication parameters for CENTRAL_MANAGER from the agentparm.txt...

communication.managerid_1=CENTRAL_MANAGER
communication.manageraddress_1=scheduler_host
communication.managerport_1=7507
communication.monitorobject_1=AGENT/AGENTMON1.0/MAIN

Add this parameter anywhere in the file...

communication.nomanagers.abort.disable=true

After the changes are made, restart the agent and the autoping should be successful.
The correct manager parameters for the AutoSys instance will get added to the agentparm.txt file once you retry the autoping.

Scenario 2 Solution:
There are a variety of reasons that can cause this behavior.
Intermittent success/fail for autoping almost always is due to a problem with the network.
This resolution is just one example we found...

The net.ipv4.tcp_tw_recycle kernel tcp parameter was enabled and set to 5.
Once this was set to 0 (disabled), the autoping started working.

Scenario 3 Solution:

The agentname parameter in agent configuration agentparm.txt on agent system should match the agent name in the Scheduler. To enable the Scheduler to recognize which agent is communicating with the manager.

agentname=<value>

In Autosys, the agent_name attribute in machine definition should match the agentname parameter in the agentparm.txt on your agent system. Check the agent_name attribute using "autorep -q -M <machine>".

In dSeries, the agent name in agent definition in the Topology Admin perspective should match the agentname parameter in the agentparm.txt on your agent system.

In ESP, the AGENT <name> in AGENTDEF should match the agentname parameter in the agentparm.txt on your agent system.

Note:

Ensure that you restart the agent after making a change in the agentparm.txt file for the change to take effect.