AutoSys agent is marked offline by scheduler. Jobs get stuck in STARTING, statuses do not change. Below errors are seen in scheduler or application server logs.
==========================
[06/11/2020 19:41:01] CAUAJM_W_00060 Timed out waiting for incoming data at socket [70] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[0:N/A][06/11/2020 21:05:27] CAUAJM_W_00060 Timed out waiting for incoming data at socket [5] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[23797:WA_AGENT][06/11/2020 21:05:27] CAUAJM_W_00068 Timed out waiting for acknowledgement at socket [71] after 4 seconds. Discarding socket... [<IP_MASKED>]
========================
[06/11/2020 22:02:55] CAUAJM_I_40245 EVENT: ALARM ALARM: AUTO_PING JOB: <job_name> MACHINE: <hostname>.<example.>.<com>[06/11/2020 22:02:55] <COMM_ERR_14 Agent on machine [<hostname>.<example.>.<com>] has not acknowledged this job request. Please investigate the status of this job.>[06/11/2020 22:03:13] CAUAJM_W_00068 Timed out waiting for acknowledgement at socket [79] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[23946:WA_AGENT WA_AGENT <hostname>.<example.>.<com>][06/11/2020 22:03:17] CAUAJM_W_00068 Timed out waiting for acknowledgement at socket [79] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[23946:WA_AGENT WA_AGENT <hostname>.<example.>.<com>][06/11/2020 22:03:30] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RUNNING JOB: <job_name> MACHINE: <hostname>.<example.>.<com>[06/11/2020 22:03:30] CAUAJM_I_40245 EVENT: MACH_ONLINE MACHINE: <hostname>.<example.>.<com>[06/11/2020 22:03:30] <The agent on machine <hostname>.<example.>.<com> is responsive. Automatically placing the machine online.>[06/11/2020 22:03:30] CAUAJM_I_40120 Completed 0 job start(s) for online machine<hostname>.<example.>.<com>.[06/11/2020 22:10:11] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: STARTING JOB: <<job_name> > MACHINE: <hostname>.<example.>.<com>[06/11/2020 22:10:20] CAUAJM_W_00068 Timed out waiting for acknowledgement at socket [91] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[23946:WA_AGENT WA_AGENT<hostname>.<example.>.<com>][06/11/2020 22:10:32] CAUAJM_E_10229 Communication attempt with the CA WAAE Agent has failed! [<hostname>.<example.>.<com>:7520][06/11/2020 22:10:32] CAUAJM_W_40290 Machine <hostname> is in question. Placing machine in the unqualified state.[06/11/2020 22:10:36] CAUAJM_W_00068 Timed out waiting for acknowledgement at socket [90] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[23946:WA_AGENT WA_AGENT<hostname>.<example.>.<com>][06/11/2020 22:10:41] CAUAJM_I_40245 EVENT: ALARM ALARM: STARTJOBFAIL JOB: <<job_name> > MACHINE: <hostname>.<example.>.<com>[06/11/2020 22:10:41] <COMM_ERR_5 Communication attempt with Agent on machine [<hostname>.<example.>.<com>] has failed.>[06/11/2020 22:10:44] CAUAJM_W_00068 Timed out waiting for acknowledgement at socket [90] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[23946:WA_AGENT WA_AGENT<hostname>.<example.>.<com>][06/11/2020 22:10:48] CAUAJM_E_10229 Communication attempt with the CA WAAE Agent has failed! [hostname:7520][06/11/2020 22:10:48] CAUAJM_I_40245 EVENT: ALARM ALARM: STARTJOBFAIL JOB: <<job_name> > MACHINE: <hostname>.<example.>.<com>[06/11/2020 22:10:48] <COMM_ERR_5 Communication attempt with Agent on machine [<hostname>.<example.>.<com>] has failed.>[06/11/2020 22:10:50] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RESTART JOB: <<job_name> > MACHINE: <hostname>.<example.>.<com>[06/11/2020 22:10:52] CAUAJM_W_00068 Timed out waiting for acknowledgement at socket [68] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[23946:WA_AGENT WA_AGENT<hostname>.<example.>.<com>][06/11/2020 22:12:27] CAUAJM_W_40253 Machine <<hostname>.<example.>.<com>> is not responding. Taking offline.[06/11/2020 22:12:28] CAUAJM_I_40245 EVENT: ALARM ALARM: MACHINE_UNAVAILABLE MACHINE: <hostname>.<example.>.<com>[06/11/2020 22:12:28] <Machine <<hostname>.<example.>.<com>> did not respond.>
Workload Automation AE (AutoSys)
There are 3 types of error messages in the log that pertain to network performance:
[06/11/2020 19:26:20] CAUAJM_E_10229 Communication attempt with the CA WAAE Agent has failed! [<hostname>.<example.>.<com>:7520][06/11/2020 19:26:20] CAUAJM_W_40290 Machine is in question. Placing machine in the unqualified state.[06/11/2020 19:26:20] CAUAJM_I_40245 EVENT: ALARM ALARM: STARTJOBFAIL JOB: <hostname>.<example.>.<com>_<jobname> MACHINE: <hostname>.<example.>.<com>CAUAJM_W_00068 Timed out waiting for acknowledgement at socket :[06/11/2020 17:20:44] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: STARTING JOB: <jobname> MACHINE: <agenthostname>...[06/11/2020 17:20:48] CAUAJM_W_00068 Timed out waiting for acknowledgement at socket [68] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[23946:WA_AGENT WA_AGENT <hostname>.<example.>.<com>][06/11/2020 17:20:52] CAUAJM_W_00068 Timed out waiting for acknowledgement at socket [68] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[23946:WA_AGENT WA_AGENT <hostname>.<example.>.<com>][06/11/2020 17:20:56] CAUAJM_W_00068 Timed out waiting for acknowledgement at socket [68] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[23946:WA_AGENT WA_AGENT <hostname>.<example.>.<com>][06/11/2020 17:20:56] CAUAJM_I_10082 [ <hostname>.<example.>.<com> connected for <jobname> 12345.67890.1][06/11/2020 22:10:20] CAUAJM_W_00068 Timed out waiting for acknowledgement at socket [91] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[23946:WA_AGENT WA_AGENT <hostname>.<example.>.<com>][06/11/2020 22:10:24] CAUAJM_W_00068 Timed out waiting for acknowledgement at socket [91] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[23946:WA_AGENT WA_AGENT <hostname>.<example.>.<com>][06/11/2020 22:10:28] CAUAJM_W_00068 Timed out waiting for acknowledgement at socket [91] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[23946:WA_AGENT WA_AGENT <hostname>.<example.>.<com>][06/11/2020 22:10:32] CAUAJM_W_00068 Timed out waiting for acknowledgement at socket [91] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[23946:WA_AGENT WA_AGENT <hostname>.<example.>.<com>][06/11/2020 22:10:41] CAUAJM_I_40245 EVENT: ALARM ALARM: STARTJOBFAIL JOB: <jobname> MACHINE: <hostname>.<example.>.<com>
CAUAJM_W_00060 Timed out waiting for incoming data at socket[06/11/2020 17:00:40] CAUAJM_W_00060 Timed out waiting for incoming data at socket [5] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[0:N/A][06/11/2020 17:00:44] CAUAJM_W_00060 Timed out waiting for incoming data at socket [5] after 4 seconds. Discarding socket... [<IP_MASKED>] qid:src[0:N/A]