The integration from AutoSys to the Automic event engine (Analytics) stops working every couple of days. The IA agent stops and cannot be started again with the error below on the IA agent.
U000111113 task will be re-started as soon as host 'IA' is active again.
The error message in the kafka log files is:
WARN akka.remote.Remoting - Tried to associate with unreachable remote address [server:port]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters.
Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.]
WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp:server:port] has failed, address is now gated for [5000] ms. Reason: [Disassociated]
WARN akka.remote.RemoteWatcher - Detected unreachable: [akka.tcp:server:port]
INFO org.apache.flink.runtime.jobmanager.JobManager - Task manager akka.tcp:server:port/user/taskmanager terminated. INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Kafka Event Queue for Client 100 (1/1) (ca54eb40e1a30cc6b297fcc14914b005) switched from RUNNING to FAILED.{code} so the address *server:port* (locally on the host itself) might be temporarily blocked - firewall etc, or the address resolving just stops to work.
Release : 2.x
Component : ANALYTICS ON PREMISE
Local issue with the TCP/IP stack on the server running AE causing network connection to get blocked.
Most likely cause is firewall/router/load balancer or other active network device closes the session due to inactivity. This can only be diagnosed by network traffic analysis, and is out of Supports' scope.
Workaround:
OneAutomationEvents=1
8. kill -HUP <even_demon pid>