Jobs inermittently fail with a communication error, what is the reason behind this?
[05/30/2021 19:08:38] CAUAJM_W_40290 Machine <MachineName> is in question. Placing MachineName in the unqualified state.
[05/30/2021 19:10:28] <COMM_ERR_5 Communication attempt with Agent on machine [MachineName] has failed.>
Workload Automation Autosys
The COMM_ERR_5 message indicates that the scheduler could successfully resolve the agent's node name to an IP address. But, the scheduler was not able to establish a connection with the agent because the agent did not reply in a timely manner.
The scheduler requests a TCP/IP socket connection from the operating system but the operating system is unable to complete the TCP/IP-level connection handshake with the TCP/IP network protocol stack on the agent machine.
The handshake could have failed for many different reasons. “Network problems or latencies” or “high load on the agent” machines causing a “delay to receive the TCP/IP connection request” or a “delay to acknowledge connection request”.
Troubleshoot the network / infrastructure layer between the Autosys and Agent nodes, fix any problematic situations on that layer.