Agents fail to re-connect after RMI failover


Article ID: 84622


Updated On:


CA Automic Applications Manager (AM)


Error Message :
ErrorMsg: AwE-5103 network socket error (8/16/17 3:20 PM)
Details: 217fa8[TLS_DH_anon_WITH_AES_128_GCM_SHA256: Socket[,port=60010,localport=61180]] Connection reset

In the following scenario, Agents can stay stuck in a SRVC_DOWN status after RMI failover:
  1. Primary RMI goes down and Agents fails over to secondary RMI successfully.
  2. Start up the primary RMI and Agents go down for a couple seconds but reconnect.
  3. Finally, if the secondary RMI is killed, Agents go down and do not reconnect.
They stay stuck in a SRVC_DOWN status. This is true even when the Primary RMI is set as the Primary RMI.



Cause type:
Root Cause: Need to add and improve debugging.
Network Checker skip sockets that are in Shutdown.
Don't call startup code during fail over so AM doesn't end up with multiple read threads for the same rmiserver, and so AM loops checking for the master in the database
fix removal of old socket in reconnect()


OS Version: N/A


Update to a fix version listed below or a newer version if available.

Fix Status: Released

Fix Version(s):
Applications Manager 9.2.1 – Available

Additional Information

Workaround :
Restart the Automation Engine and all Remote Agents.