Periodically, while executing deployments we get the following error:
Failed to connect remote agent
If we retry several times it eventually works. The problem always occurs while it is trying to execute the action: Get File Or Folder From Remote Agent.
Release : 6.6
Component : CA RELEASE AUTOMATION CORE
The cause was found to be related to a PING message that was timing out after 60 seconds. The agent executing the action tries to send the nimi ping message directly to the remote agent as defined in the Get File Or Folder From Remote Agent action's field: Remote Agent NODE-ID. If it cannot ping the agent within 60 seconds then this is the response/failure it will receive.
There are 2 potential solutions for this problem.
One of the solutions described above can be implemented. Chose one. Details for implementing each solution are described below.
The 60 second timeout that an agent waits was a hardcoded value. Cumulative fix 6.6.8 makes this value configurable. To implement this fix you need to:
Send Request was called for:PING@<hostname> request:1f6e7c3e3e3a7400_5@vfrkpingtst03 objectId:com.nolio.platform. shared.communication. CommunicationMessage@1f31602. Waiting 120000ms for response.
By default, agents will attempt to communicate directly with another agent if the route to that agent is not greater than 2. These settings would change this behavior so that everything goes through the NES. Example: If 2 agents are reporting to the same NES then, by default, the agent will try to communicate directly with the other agent. This change would make it so that these operations (like Get Remote File or Folder from Remote Agent) would go through the NES.
Before making these changes it is recommended to consider:
To apply this solution there are two configuration settings:
Both of these settings are a part of the routing xml child node. An example of the routing xml child node with these two settings are included below so that you can understand:
By default, these settings are not defined in the conf/nimi_config.xml fie. If you confirm that full_route_check setting is not defined in the file then does not need to be explicitly defined - since the default and recommended values are the same.
Example with both settings present (in bold text):
<config>
<nimi>
<routing>
<threadpool>...</threadpool>
<full_route_check>false</full_route_check>
<max_route_check_size>0</max_route_check_size>
<timeout>...</timeout>
</routing>
</nimi>
</config>
After making these changes you should stop the agent, clear the contents of the NOLIOAGENT_HOME/persistency folder and restart the agent.