Jobs stuck in Ready state in CA Workload Automation DE
search cancel

Jobs stuck in Ready state in CA Workload Automation DE

book

Article ID: 76832

calendar_today

Updated On:

Products

Workload Automation Agent DSERIES- SERVER CA Workload Automation DE - System Agent (dSeries) CA Workload Automation AE - System Agent (AutoSys)

Issue/Introduction

The CA Workload Automation DE may show jobs in READY state for indefinite period.  This can be due to several factors.  Please check the following list for possible issues:

Environment

CA Workload Automation DE 
CA Workload Automation System Agent ANY

OS Linux/ UNIX

Cause

The CA Workload Automation DE submits jobs to the CA WA Agents and then waits for an update.  If the Agent is not able to communicate back, then jobs will continue to stay in READY/STARTING state.  The following issues may cause the jobs to stay to not run or complete.

1. Check if the CA WA Agents are able to communicate back to the manager.  Check the transmitter.log for any communication errors like these:

04/09/2018 09:04:44.667 ...... TCP/IP Controller Plugin.Transmitter pool thread <Slow:1>.CybTargetHandlerChannel.constructConversation[:1194] - Error connecting to CA_MANAGER:
   cybermation.library.communications.CybConversationConnectException: Connection failed to: <MANAGER-ID><7507>. Connection refused
..... snip .....
Caused by: java.net.ConnectException: Connection refused
  at java.net.PlainSocketImpl.socketConnect(Unknown Source)

2. Check if the jobs are on READY/STARTING state on Linux and UNIX Agent.  This may be due to maximum number of file in a directory limit.

3. Check if the Agent received the message in Agent's receiver log located in <Agent_Installdir>/log/receiver.log.

4. If using an alias of the agent, make sure the name in manager side matches exactly as it is defined in agentparm.txt

The following message indicates the agent is getting messages from Manager that have incorrect agent name or manager instance ID.

01/01/2021 00:00:12.345-0400 1 TCP/IP Controller Plugin.Receiver pool thread <Regular:2>.CybReceiverChannel.a[:214] - Can't parse the message: cybermation.library.communications.CybConversationWrongMessageException: Wrong target:

5. It has been reported that in Windows OS some security software (e.g. Anti-Virus), may scan and hold agent files.  This can result in agent and jobs getting stuck.

E.g. of errors in logs

01/01/20XX 00:00:12.345-0400 4 RunnerPlugin.Spool Cleaner.CybSpoolCleaner$1.visitFile[:248] - Internal exception occured while attempting to clean spool directory with exception:   C:\Program Files\CA\WA Agent\spool\nullin: The process cannot access the file because it is being used by another process.

01/01/20XX 00:00:12.345-0400 1 TCP/IP Controller Plugin.Transmitter pool thread <Regular:2>.CybTargetHandlerChannel.a[:617] - Error sending message to CAWA_PROD:  cybermation.library.communications.CybConversationException: Message removal failed
.......
Caused by: cybermation.library.persistence.CybPersistenceException: C:\Program Files\CA\WA Agent\database\transmitter_queue_spool_cawa_prod.tmp can not be renamed

-----

 

 


 

Resolution

If the issue is due to network connections, such as connection refused then check the following:

1.  Network connection, ping the manager host from agent host

2.  Check DNS resolution.  The Agent host may not be able to resolve the hostname (not Manager ID) of the scheduler server

3. Check the Manager port from agent host.  You can get the port from manager side or from the agentparm.txt of the CA Agent.  Default is 7507.

4. To get the ports, look for this in agentparm.txt:

communication.manageraddress_1=manager_host.example.com   
communication.managerid_1=MANAGER-ID           <- Your ID will be different
communication.managerport_1=7507

5. Check agent alias name (if used) in manager side and in agentparm.txt.  They must match, e.g.

communication.alias_1=AGENT_DB

6. Check the agent parent directory permission.  The non-root user must have read access so that job can write spool. 

AV/ Security software: he agent constantly writes to log, spool and database directories.  The AV software may constantly scan due to frequent changes.  Consult your AV vendor on setting up limits and exceptions.


7. If there are no errors or any communication back to the manager in transmitter.log, then check to see if maximum number of file limit has been reached.  Before running the job, the CA Agent tries to create a spool directory to capture the spool output.  If it is unable to do so, then jobs will not run.  Check this link for more details.

8. If no message is received by the Agent in <Agent_Installdir>/log/receiver.log, then check the agent configuration with the DE manager in Topology view. 

- Make sure that the agent address (IP address/hostname) and port are defined correctly. For alias agent, the IP address/hostname and port must match that of the primary agent.

- Make sure that there is no firewall in place blocking communication between DE manager and Agent.