search cancel

AGENTDOWN: has been closed by partner

book

Article ID: 243974

calendar_today

Updated On:

Products

CA Workload Automation DE

Issue/Introduction

We are experiencing intermittent AGENTDOWN status messages for Apps on multiple ESP agents at the same time from our ESP server.  

We found multiple "conversation" [manager_ipaddress:ephemeral_port]->[agent_ipaddress:7520] "has been closed by partner" in the management server's errrors.txt file.

 

Environment

Release : 12.3

Component : ESP dSeries Workload Automation DE

Cause

This can occur when there are intermittent network problems between DEServer and agent. When these intermittent network problems exist, the agent closes the connection (via a TCP Reset: RST) after wait for 10 seconds. 

Resolution

There are 2 options for fixing these problems. 

  1. Work with your network team to address network problems. 
  2. Use/Uncomment the following settings on the dSeries scheduling server. The settings can be found in the conf/server.properties file.
    • Create a copy of the conf/server.properties file
    • Edit conf/server.properties:
    • Uncomment the line:
      #agentdown.notification.threshold.attempts=5
    • Uncomment the line:
      #agentCommunicationFailed.queue.reprocessing.interval=30000
    • Stop DE Server
    • Start DE Server
 
You may see similar errors in the logs that would normally lead to AGENTDOWN messages. However, if the agentdown.notification.threshold.attempts and agentCommunicationFailed.queue.reprocessing.interval settings are enabled than the message below commonly follow those errors - showing the settings are being used (preventing the AgentDown by taking the new policy/settings into account): 
 
20220513 07:00:13.917 [essential] [INFO] DM.OutputMessageQueue.<AgentName>: [2022-05-13_07:00:13.917] Though AFM communication failed for agent '<AgentName>' subsequent Ping succeeded so assuming that agent is busy but not responding, so don't send AGENTDOWN message as of now (isFailureCountExceeded=false, MAX_FAILCOUNT_THRESHOLD=5, isUnrecoverableException=false)^M
 
 

Additional Information

Messages associated with this condition can be found below.
 
dSeries Scheduling Server messages:
"has been closed by the partner" messages that were previously causing AgentDown notifications:
20220513 07:00:13.915 [dm:communications:exception] [ERROR] DM.OutputMessageQueue.<AgentName>: [2022-05-13_07:00:13.915] Exception caught sending to <AgentName>: The conversation <manager_ipaddress>:49280-><agent_ipaddress>:8520 has been closed by the partner. - [Originator:ESP_UAT_12] [Destination: <AgentName>] [ProcessingStatus: 0] [Priority: 0] [AFM: 20220513 07000061+0400 <AgentName><ManagerName> <Path/To/AppName.GenerationID>/<JobName>_<AgentName> RUN . Data(Command=cmd.exe) User(<username>) Password(<pass>) TargetSubsystem(WIN) MFUser(SCHEDMASTER) ] [modified: false] [originatorModified: false] [destinationModified: false] [processingStatusModified: false] [priorityModified: false] [afmModified: false] [removed: false] [created: true] [agentDownNotified: false] [shutdownMessage: false] [controlManagerMessage: false] [MessageQueueTable: ESP_MANAGER_OUTQ] [MsgQueueTableKey: 6672592, 1652439600620] [TableDaoKey: 6672592]  number of messages remaining: 0^M
com.ca.wa.comp.library.communications.WAConversationCloseByPartnerException: The conversation <manager_ipaddress>:49280-><agent_ipaddress>:8520 has been closed by the partner.^M
        at com.ca.wa.comp.library.communications.WAConversation.receivePrefix(WAConversation.java:582)^M
        at com.ca.wa.comp.library.communications.WAConversation.receiveBinary(WAConversation.java:452)^M
        at com.ca.wa.comp.library.communications.WAConversation.receiveText(WAConversation.java:656)^M
        at com.ca.wa.comp.distributedmanager.communications.WADistributedManagerOutputMessageQueue.receiveAck(WADistributedManagerOutputMessageQueue.java:155)^M
        at com.ca.wa.comp.library.communications.OutputMessageQueue.sendMessages(OutputMessageQueue.java:372)^M
        at com.ca.wa.comp.distributedmanager.communications.WADistributedManagerOutputMessageQueue.sendMessages(WADistributedManagerOutputMessageQueue.java:109)^M
        at com.ca.wa.comp.library.communications.OutputMessageQueue.run(OutputMessageQueue.java:171)^M
        at com.ca.wa.core.library.concurrent.WAThreadPool$ThreadPoolThread.run(WAThreadPool.java:698)^M
Caused by: java.io.EOFException^M
        at java.io.DataInputStream.readInt(DataInputStream.java:392)^M
        at com.ca.wa.comp.library.communications.WAConversation.receivePrefix(WAConversation.java:552)^M
        ... 7 more^M
 
The "handling" message (there is no "deactivating message or Status update indicating an AGENTDOWN was triggered): 

20220513 07:00:13.917 [essential] [INFO] DM.OutputMessageQueue.<AgentName>: [2022-05-13_07:00:13.917] Though AFM communication failed for agent '<AgentName>' subsequent Ping succeeded so assuming that agent is busy but not responding, so don't send AGENTDOWN message as of now (isFailureCountExceeded=false, MAX_FAILCOUNT_THRESHOLD=5, isUnrecoverableException=false)^M

 
Workload Automation Agent messages:

These are often accompanied by messages in the agent's receiver.log indicating that a conversation has been established and then a "Read timed out" exception occurs. Example (timestamps and ports don't match - though they do when comparing from appropriate sources): 

03/21/2022 00:00:03.849-0400 2 TCP/IP Controller Plugin.Receiver pool thread <Regular:2>.CybReceiverChannel.a[:158] - Conversation from <manager_ipaddress>:51738 to <agent_ipaddress>:7520 arrived
03/21/2022 00:00:17.350-0400 1 TCP/IP Controller Plugin.Receiver pool thread <Regular:2>.CybReceiverChannel.a[:234] - cybermation.library.communications.CybConversationTimeoutException: Read timed out
                                                                                                                        at cybermation.library.communications.protocol.CybCommunicationProtocolDynamic.receiveData(CybCommunicationProtocolDynamic.java:749)
                                                                                                                        at cybermation.library.communications.protocol.CybCommunicationProtocolDynamic.receiveMessage(CybCommunicationProtocolDynamic.java:365)
                                                                                                                        at cybermation.library.communications.CybConversation.receiveMessage(CybConversation.java:460)
                                                                                                                        at cybermation.commplugins.tcpip.receiver.CybReceiverChannel.a(CybReceiverChannel.java:174)
                                                                                                                        at cybermation.commplugins.tcpip.receiver.CybReceiverChannel.call(CybReceiverChannel.java:139)
                                                                                                                        at cybermation.commplugins.tcpip.receiver.CybReceiverChannel.call(CybReceiverChannel.java:51)
                                                                                                                        at cybermation.commplugins.tcpip.receiver.CybReceiverScheduler$CallableWrapper.call(CybReceiverScheduler$CallableWrapper.java:353)
                                                                                                                        at cybermation.commplugins.tcpip.receiver.CybReceiverScheduler$CallableWrapper.call(CybReceiverScheduler$CallableWrapper.java:317)
                                                                                                                        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                                                                                                                        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                                                                                                                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                                                                                                                        at java.lang.Thread.run(Thread.java:821)
                                                                                                                      Caused by: java.net.SocketTimeoutException: Read timed out
                                                                                                                        at java.net.SocketInputStream.socketRead0(Unknown Source)
                                                                                                                        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
                                                                                                                        at java.net.SocketInputStream.read(SocketInputStream.java:171)
                                                                                                                        at java.net.SocketInputStream.read(SocketInputStream.java:141)
                                                                                                                        at cybermation.library.communications.protocol.CybCommunicationProtocol.receiveLength(CybCommunicationProtocol.java:414)
                                                                                                                        at cybermation.library.communications.protocol.CybCommunicationProtocolDynamic.receiveData(CybCommunicationProtocolDynamic.java:514)
                                                                                                                        at cybermation.library.communications.protocol.CybCommunicationProtocolDynamic.receiveMessage(CybCommunicationProtocolDynamic.java:365)
                                                                                                                        at cybermation.library.communications.CybConversation.receiveMessage(CybConversation.java:460)
                                                                                                                        at cybermation.commplugins.tcpip.receiver.CybReceiverChannel.a(CybReceiverChannel.java:174)
                                                                                                                        at cybermation.commplugins.tcpip.receiver.CybReceiverChannel.call(CybReceiverChannel.java:139)
                                                                                                                        at cybermation.commplugins.tcpip.receiver.CybReceiverChannel.call(CybReceiverChannel.java:51)
                                                                                                                        at cybermation.commplugins.tcpip.receiver.CybReceiverScheduler$CallableWrapper.call(CybReceiverScheduler$CallableWrapper.java:353)
                                                                                                                        at cybermation.commplugins.tcpip.receiver.CybReceiverScheduler$CallableWrapper.call(CybReceiverScheduler$CallableWrapper.java:317)
                                                                                                                        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                                                                                                                        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                                                                                                                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                                                                                                                        at java.lang.Thread.run(Thread.java:821)
03/21/2022 00:00:17.350-0400 2 TCP/IP Controller Plugin.Receiver pool thread <Regular:2>.CybReceiverChannel.a[:253] - Exiting conversation