search cancel

EEM Failover out of sync

book

Article ID: 201997

calendar_today

Updated On:

Products

CA Workload Automation AE - Scheduler (AutoSys) CA Process Automation Base

Issue/Introduction

We have an EEM replication setup and the primary EEM server has gotten out of sync with the secondaries.

Although the secondary servers are in sync with each other, how do we get the primary EEM server back in sync with the secondaries? 

This document will go over how to restore synchronization with the primary EEM server so it can be brought back online. 

 

Cause

There could be a few reasons why the EEM servers get out of sync. 

Communication issues, dxgrid db gets full. This document won't go into those details, but will walk you through resolving the sync problems after the root cause has been resolved.

This document is for when your secondary EEM server(s) has the information you expect, but the primary no longer does - could be policy differences, user group differences. Whatever the problem, it's the secondary that is the source of truth in this case. 

When the primary is the source of truth, then you only need to run the sync command via the eiamclustersetup.jar. Please see the EEM documentation for details on running this.

Environment

Windows or Linux
EEM 12.5/12.6 current releases. 

Resolution

The assumptions here are:

The secondary server(s) are the source of truth.
The primary server is not running.
Failover/replication has been set up and running successfully in the past.

The first method will allow for the secondary server(s) to remain online.
The assumption for this is that no changes will be happening on the secondary server(s) - no updates to policies, user group changes - no changes at all. 

On the secondary server (only one of them if there is more than one secondary), navigate to

Windows Command Line - C:\Program Files\CA\Directory\dxserver
Linux  - $DXHOME   by default /opt/CA/Directory

Run the command:

dxserver onlinebackup itechpoz

This creates a itechpoz.zdb file in the $DXHOME/data/techpoz (C:\Program Files\CA\Directory\dxserver\data\itechpoz) folder

Do not change directories - remain in the DXHOME location. 

Next, run the following command:

dxdumpdb -f itechpoz.ldif -z itechpoz

There will now be an itechpoz.ldif file located in DXHOME.

Copy this itechpoz.ldif to the primary EEM server to the DXHOME location

From the DXHOME location on the primary, run the command:

dxloaddb itechpoz.ldif itechpoz

Start the primary EEM server

su - dsa -c "dxserver start all"
/opt/CA/SharedComponents/iTechnology/S99igateway start

or in Windows, start the CA Directory - itechpoz service first, then start the CA iTechnology iGateway service.

Log into the primary EEM server as EiamAdmin.

Verify that you now see everything in the primary that is expected. 
Test that replication is still functional by following the steps in KB Article 37336

The KB link opens in a new window.

========

If you cannot guarantee that no changes will be made on the secondary server(s), then the instructions are almost the same as the above, but all EEM servers must be shut down. 
The other difference is the "-z" switch used above - this allows for an online backup, which we will not be doing here.

On the secondary server (only one of them if there is more than one secondary), navigate to

Windows Command Line - C:\Program Files\CA\Directory\dxserver
Linux  - $DXHOME   by default /opt/CA/Directory

Run the command:

dxdumpdb -f itechpoz.ldif itechpoz

There will now be an itechpoz.ldif file located in DXHOME.

Copy this itechpoz.ldif to the primary EEM server to the DXHOME location

From the DXHOME location on the primary, run the command:

dxloaddb itechpoz.ldif itechpoz

Start the primary EEM server

su - dsa -c "dxserver start all"
/opt/CA/SharedComponents/iTechnology/S99igateway start

or in Windows, start the CA Directory - itechpoz service first, then start the CA iTechnology iGateway service.

Log into the primary EEM server as EiamAdmin.

Verify that you now see everything in the primary that is expected. 
Test that replication is still functional by following the steps in KB Article 37336

The KB link opens in a new window.

Additional Information

On Linux, you may need to run the dxdumpdb and dxloaddb command using su - dsa -c "dxdumpdb -f itechpoz.ldif itechpoz" and su - dsa -c "dxloaddb -f itechpoz.ldif itechpoz"