ZDU - explanation of agent transfers to new CPs and how to control reconnections
search cancel

ZDU - explanation of agent transfers to new CPs and how to control reconnections

book

Article ID: 259197

calendar_today

Updated On:

Products

CA Automic Workload Automation - Automation Engine CA Automic One Automation

Issue/Introduction

When getting to step 4 of the zero downtime upgrade, do the agents reconnect to a new (target) CP?  How is this done?

Also, is there a way to get the agents to transfer in batches instead of one at a time using the step 4 list or reconnecting all use the same list?

Environment

Release : 12.3, 21.0

Resolution

When an agent connects (or reconnects) to the system in step 4 of the ZDU, they will connect to a CP (or JCP if going from 21.0 to 21.0 or higher) on the target (being upgraded to) CPs.  Explanation:

Beginning with version 12.3, there is a ranking for each CP that is higher if there are more connections on a CP.  The ranking is lower for fewer connections.  The ranking is on a scale of 0 to 9999. 
When an agent connects to the first CP, it gets a list of running CPs to try to connect to.  The agent also finds the CPs rankings and will decide to connect to the one with the lowest ranking. 

When getting to step 4 of the ZDU, the ranking for base (upgrade from) versions of the CP is set to 9999 whereas the new, target, CPs are at a lower than 9999 number.  When agents reconnect to the system, they will connect to one with a ranking lower than 9999.

Here is what is shown in the agent log and provides the logic behind the reconnection:

Original connection of agent WIN01 to CP1:
20230125/000624.976 - U02000011 Connection to Server 'IP(SERVER):8843' initiated.
20230125/000625.007 - U02000004 Connection to Server 'AUT1230#CP001' successfully created.
20230125/000625.007 - U02000354 CP Server 'AUT1230#CP001' reports ranking '1'.
20230125/000625.007 - U02000073 Connection to system 'AUT1230' via CP Server 'AUT1230#CP001' successfully established.


Reconnect after server processes started:

20230125/012134.316 - U02000010 Connection to Server 'AUT1230#CP001' terminated.
20230125/012134.316 - U02000072 Connection to system 'AUT1230' initiated.
20230125/012134.321 - U02000011 Connection to Server 'IP(SERVER):8843' initiated.
20230125/012134.344 - U02000004 Connection to Server 'AUT1230#CP001' successfully created.
20230125/012134.344 - U02000354 CP Server 'AUT1230#CP001' reports ranking '9999'.
20230125/012134.344 - U02000076 Connection to CP Server 'AUT1230#CP001' closed.
20230125/012134.344 - U02000011 Connection to Server 'IP(SERVER):8843' initiated.
20230125/012134.352 - U02000004 Connection to Server 'AUT1230#CP003' successfully created.
20230125/012134.352 - U02000354 CP Server 'AUT1230#CP003' reports ranking '1'.
20230125/012134.352 - U02000073 Connection to system 'AUT1230' via CP Server 'AUT1230#CP003' successfully established.

So when the ZDU gets to this point, it looks like the current CPs increase their connection rankings to 9999 and any new connections will always go to a different CP since they want to connect to the lowest ranking CP.

Regarding reconnecting in a controlled manner in an active active setup:

  1. On node 1, stop one CP - wait for all agents to reconnect to other CPs
  2. On node 1, stop the next CP - wait for all agents to reconnect to other CPs
    etc...
  3. Upgrade node 1 CPs
  4. At this point, go to step 4 of ZDU where reconnections need to take place
  5. Start node 1 AE processes
  6. Repeat steps 1 through 3 for node 2, then step 5 on node 2
  7. Repeat on node 3
  8. Repeat on node 4, although this may be tricky and need to have disconnects of individual agents or done in a chunk on node 4 instead so that the system still has the base level CPs in case a rollback is needed.