Connectivity check failed: Deployer on host <agentname> does not respond
search cancel

Connectivity check failed: Deployer on host <agentname> does not respond

book

Article ID: 205465

calendar_today

Updated On:

Products

CA Release Automation - Release Operations Center (Nolio) CA Release Automation - DataManagement Server (Nolio)

Issue/Introduction

While running a deployment the following errors were returned: 

  • Error: <Process Name> has a process error - Connectivity check failed
  • Error: <Process Name> has a process error - Deployer on host <Agent hostname> does not respond.
  • Error: <Process Name> has a process error - Server <Agent hostname>(Agent IPAddress) error

Environment

Release : 6.6

Component : CA RELEASE AUTOMATION CORE

Cause

When an Execution Server starts a job on an agent, it first tries to PING the agent (see Additional Information for log msgs). The errors above indicate that the Execution Server sent the PING message to the Agent but did not get a response.

 

Resolution

There are three options to resolve this problem:

  1. Resolve connectivity issues between the NES and Agent.
  2. Connect the agent to a different NES (Execution Server).
  3. Increase the timeout used by the connectivity check service. Details below.

 

Increasing the timeout used by the connectivity check service

Addressing this problem using options #1 or #2 is ideal. However, if these options are not possible then the timeout used by the connectivity check service can be increased. 

The connectivity check service sends ping messages to all the nodes connected to the supernode (NES) to confirm the agent is actively connected/responding. The timeout of these messages may be changed in the execution-servlet.xml file. The steps to update this timeout value is as follows: in the following bean and as you can see the default value is 35 seconds:

  • Log into the NES that is having problems getting a response from the agent in allocated/default 35 seconds.
  • cd to <NES_InstallDir>/webapps/execution/WEB-INF/
  • create a backup of the execution-servlet.xml file
  • open execution-servlet.xml and locate the node/XPath to value:

    //beans/bean[@id='executionEngine']/property[@name='connectivityCheckIntervals']/list/value

  • Update the value from the default (35000) to a value appropriate for your network/environment. 
  • Save and close the file. 
  • Stop/Start the NES.

 

Note: 

If the connectivityCheckIntervals value needs to be adjusted, it is worth noting that this is the list-type parameter which means we may specify several timeouts here. Example: 
  • Imagine we have 2 values here:35000 and 60000. 
  • With this configuration the connectivity check service will send the first ping message and will wait for 35 seconds for the response. 
  • If the response was not received in 35 seconds it will send another ping message and will wait for its response for 60 seconds.
  • The agent will go offline after 60 seconds have passed and no response is retrieved.

 

 

Additional Information

Troubleshooting

If the PING message sent by the NES is not received by the agent then it is expected that the underlying problem is a network issue. If additional analysis is needed then the following should be gathered after the problem has been reproduced:

  • TCPDump/Wireshark trace started on the NES and NAG experiencing this problem. Having the traces started on both offers the best amount of detail. At a minimum, the network trace should be captured on the NES. 
  • The logs folder from the NES and NAG. This way we'll understand what the app believes has happened during the timeframe in question. 

 

When this type of error occurs, consider using the NES's JMX to manually PING the agent machine in question. This will give you the capability to reproduce problems sending messages without starting deployments. If you can reproduce the message sending problem then it can give an opportunity to get the network tracing enabled, reproduce, capture data. To do this:

    • Open JMX for the Execution Server managing the agent in question.
    • From the JMX find: com.nolio.nimi.jmx:name=nimiJMX,type=NimiJMX bean/page.
    • Then, use the sendMessageTo method. To use this method, there is a message (p2 field) to an agent node_id (p1 field).
      • Fill out Agent’s NodeID in the p1 field; and
      • Fill out a message in the p2 field (ex: WhereAreYou).
    • Then, click invoke.

If it gets a response then it will reflect that the message was sent. If it doesn't get a response then it will eventually give a timeout error. 

 

Example Messages

See below for example message that you should be able to see exchanged between the NES and Agent upon a successful PING.

Note:

  1. These messages can be found in the nimi.log
  2. You can use the ID in the payload to track the message between nodes. 

 

Execution Server:

2020-12-07 15:09:50,397 [JobExecutorThread-6] DEBUG (com.nolio.nimi.appmsg.durability.DurableCommunicationApi:144) - Got new message: es_<nes_nodeId>_160693912061731:payload=[ID:7c6007d8441a5800_1e@es_<nes_nodeId>, from:es_<nes_nodeId>, to:PING@<agent_nodeId>- PING]

 

Agent Server:

2020-12-07 15:09:50,503 [New I/O server worker #1-2] DEBUG (com.nolio.nimi.appmsg.durability.DurableCommunicationApi:233) - Received shipping: es_<nes_nodeId>_160693912061731:payload=[ID:7c6007d8441a5800_1e@es_<nes_nodeId>, from:es_<nes_nodeId>, to:PING@<agent_nodeId>- PING]

 

Agent Server:

2020-12-07 15:09:50,506 [Communication Msg Processor-2] DEBUG (com.nolio.nimi.appmsg.durability.DurableCommunicationApi:155) - Got new message: <agent_nodeId>_160737107556804:payload=[ID:d3b1d9cb9d5b800_3@<agent_nodeId>, from:<agent_nodeId>, to:MESSAGE_RESPONSE_SERVICE@es_<nes_nodeId>- [Response for message: 7c6007d8441a5800_1e@es_<nes_nodeId>]]

 

Execution Server:

2020-12-07 15:09:50,509 [New I/O client worker #1-1] DEBUG (com.nolio.nimi.appmsg.durability.DurableCommunicationApi:222) - Received shipping: <agent_nodeId>_160737107556804:payload=[ID:d3b1d9cb9d5b800_3@<agent_nodeId>, from:<agent_nodeId>, to:MESSAGE_RESPONSE_SERVICE@es_<nes_nodeId>- [Response for message: 7c6007d8441a5800_1e@es_<nes_nodeId>]]