How does Enterprise Manager (MOM) Failover work internally, how to test it on a single host & what are the differences between PRIMARY-PRIMARY & PRIMARY-SECONDARY

book

Article ID: 143526

calendar_today

Updated On:

Products

CA Application Performance Management (APM / Wily / Introscope)

Issue/Introduction

How does Enterprise Manager/MOM Failover work internally and how to test it on a single host

Environment

APM Enterprise Manager 10.x 

Resolution

1. A fully implemented failover on multiple hosts typically shares a single complete EM installation on a HA Netwoirk Attached Storage (NAS) e.g. NFS, SMB
The key component of the failover process is the lock files (.lck) which are located in directory EM_HOME/config/internal/server:

  • When the first EM (EM1 on Host1) starts it acquires an exclusive lock on file primary_em.lck (using Java API method FileChannel.lock)
  • When the second EM (EM2 on Host2) starts it :
    • Acquires an exclusive lock on file secondary_em.lck
    • Runs in a wait mode regularly checking the access status of primary_em.lck
  • When EM1 goes down EM2:
    • Immediately acquires the exclusive lock on primary_em.lck
    • Relinquishes the lock on secondary_em.lck
    • Completes its start-up.

2. For a test on the same host you use a single installation.

a

. Edit the Enterprise Manager properties file to have these values:

introscope.enterprisemanager.failover.enable=true
introscope.enterprisemanager.failover.primary=localhost

b. Start 2 copies of the EM executable i.e. Introscope_Enterprise_Manager (Unix/Linux) or Introscope_Enterprise_Manager.exe (Windows). The shared IntroscopeEnterpriseManager.log file will show something like:

===========================================================================================

EM1 started

11/11/15 05:08:08.550 PM EST [INFO] [main] [Manager.HotFailover] The Introscope Enterprise Manager is configured as a Primary EM
11/11/15 05:08:08.554 PM EST [INFO] [main] [Manager.HotFailover] Acquiring secondary lock...
11/11/15 05:08:08.557 PM EST [INFO] [main] [Manager.HotFailover] Acquired secondary lock
11/11/15 05:08:08.560 PM EST [INFO] [main] [Manager.HotFailover] Acquiring primary lock...
11/11/15 05:08:08.563 PM EST [INFO] [main] [Manager.HotFailover] Acquired primary lock
11/11/15 05:08:08.566 PM EST [INFO] [main] [Manager.HotFailover] Released secondary lock
11/11/15 05:08:08.567 PM EST [INFO] [main] [Manager.HotFailover] Proceeding with startup
...

EM2 started

11/11/15 05:12:01.366 PM EST [INFO] [main] [Manager.HotFailover] The Introscope Enterprise Manager is configured as a Primary EM
11/11/15 05:12:01.370 PM EST [INFO] [main] [Manager.HotFailover] Acquiring secondary lock...
11/11/15 05:12:01.373 PM EST [INFO] [main] [Manager.HotFailover] Acquired secondary lock
11/11/15 05:12:01.375 PM EST [INFO] [main] [Manager.HotFailover] Acquiring primary lock...
...

EM2 waits until EM1 goes down & then you see:

11/11/15 05:14:22.076 PM EST [INFO] [main] [Manager.HotFailover] Acquired primary lock
11/11/15 05:14:22.098 PM EST [INFO] [main] [Manager.HotFailover] Released secondary lock
11/11/15 05:14:22.105 PM EST [INFO] [main] [Manager.HotFailover] Proceeding with startup

===========================================================================================

NOTE: This scenario will be for a PRIMARY-PRIMARY i.e. EM1 (localhost) will not retake control after it later restarts because EM2 (localhost) also acts as a PRIMARY.

3. For a multi-host implementation of EM1 (Host1) & EM2 (Host2) across a NAS you can choose one of 2 options:
PRIMARY-PRIMARY (introscope.enterprisemanager.failover.primary=Host1,Host2):
After EM1 goes down EM2 will take control and when EM1 starts EM2 will still retain control. If EM2 then goes down EM1 will then take control and again retain it when EM2 restarts.
PRIMARY-SECONDARY (introscope.enterprisemanager.failover.primary=Host1 and introscope.enterprisemanager.failover.secondary=Host2)
After EM1 goes down EM2 will take control but when EM1 restarts EM1 will regain control.