automated_deployment_engine (ADE) probe performing poorly in large environment
search cancel

automated_deployment_engine (ADE) probe performing poorly in large environment

book

Article ID: 238604

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

When deploying probes en masse via the ADE probe in a large environment (hundreds to thousands of hubs/robots), a high failure rate occurs.

Many errors are seen in the logs. The typical errors look like one of the following:

 

java.io.IOException: (80) Session error, Unable to open a client session for <IP_ADDRESS:48XXX>: Connection refused:connect


(2) communication error, I/O error on nim session (C) com.nimsoft.nimbus.NimNamedClientSession(Socket[addr=/<IP_ADDRESS>,port=48xxx,localport=xxxxx]): Read timed out


(1) error, Received status (1) on response (for sendRcv) for cmd = 'inst_pkg'

(80) Session error, Unable to open a client session for <IP_ADDRESS>:48xxx: Connection refused: connect


All the target systems are running and available, and the distsrv probe is able to deploy packages to the same systems without failure.

How can we configure ADE to ensure a more successful rate of deployments?

Environment

Release : 20.x

Component : UIM - ADE

Resolution

There are several potential settings which can impact the performance and behavior of ADE.

The first is the memory allocation.  By default the probe is configured with a small amount of RAM.  It is recommended to increase this to 2gb or 4gb for larger environments.

Secondly, the following configuration keys are available - they will not be present by default but can be added to the .cfg using e.g. Raw Configure as needed:

This one goes in <automated_deployment_engine> section of the CFG:

retry_send_file_count = X   ;  default 4.  number of retries ADE will do when sending files during deployment

These two go in <setup> :

inst_execute_status_retries = X; default is 30, number of retries to obtain installation status before giving up
probeFileTransferBufferSize = X; similar to blocksize in distsrv, default is 1024, higher numbers may yield better performance

 

Finally, it may be helpful to limit the number of threads (simultaneous jobs) so that the memory is not consumed too quickly.

This is controlled by the "numThreads" parameter in the <automated_deployment_engine> section of the .cfg which normally is set to -1 :

Setting this to e.g. 32 or 64 will reduce the number of simultaneous deployments and this can help ensure the probe does not consume too many resources and "step on its own toes" with regard to deploying too many jobs at once.