How does Enterprise Manager/MOM Failover work internally and how to test it on a single host
APM Enterprise Manager 10.x
1. A fully implemented failover on multiple hosts typically shares a single complete EM installation on a HA Netwoirk Attached Storage (NAS) e.g. NFS, SMB
The key component of the failover process is the lock files (.lck) which are located in directory EM_HOME/config/internal/server:
2. For a test on the same host you use a single installation.
a
. Edit the Enterprise Manager properties file to have these values:
introscope.enterprisemanager.failover.enable=true
introscope.enterprisemanager.failover.primary=localhost
b. Start 2 copies of the EM executable i.e. Introscope_Enterprise_Manager (Unix/Linux) or Introscope_Enterprise_Manager.exe (Windows). The shared IntroscopeEnterpriseManager.log file will show something like:
===========================================================================================
EM1 started
11/11/15 05:08:08.550 PM EST [INFO] [main] [Manager.HotFailover] The Introscope Enterprise Manager is configured as a Primary EM
11/11/15 05:08:08.554 PM EST [INFO] [main] [Manager.HotFailover] Acquiring secondary lock...
11/11/15 05:08:08.557 PM EST [INFO] [main] [Manager.HotFailover] Acquired secondary lock
11/11/15 05:08:08.560 PM EST [INFO] [main] [Manager.HotFailover] Acquiring primary lock...
11/11/15 05:08:08.563 PM EST [INFO] [main] [Manager.HotFailover] Acquired primary lock
11/11/15 05:08:08.566 PM EST [INFO] [main] [Manager.HotFailover] Released secondary lock
11/11/15 05:08:08.567 PM EST [INFO] [main] [Manager.HotFailover] Proceeding with startup
...
EM2 started
11/11/15 05:12:01.366 PM EST [INFO] [main] [Manager.HotFailover] The Introscope Enterprise Manager is configured as a Primary EM
11/11/15 05:12:01.370 PM EST [INFO] [main] [Manager.HotFailover] Acquiring secondary lock...
11/11/15 05:12:01.373 PM EST [INFO] [main] [Manager.HotFailover] Acquired secondary lock
11/11/15 05:12:01.375 PM EST [INFO] [main] [Manager.HotFailover] Acquiring primary lock...
...
EM2 waits until EM1 goes down & then you see:
11/11/15 05:14:22.076 PM EST [INFO] [main] [Manager.HotFailover] Acquired primary lock
11/11/15 05:14:22.098 PM EST [INFO] [main] [Manager.HotFailover] Released secondary lock
11/11/15 05:14:22.105 PM EST [INFO] [main] [Manager.HotFailover] Proceeding with startup
===========================================================================================
NOTE: This scenario will be for a PRIMARY-PRIMARY i.e. EM1 (localhost) will not retake control after it later restarts because EM2 (localhost) also acts as a PRIMARY.
3. For a multi-host implementation of EM1 (Host1) & EM2 (Host2) across a NAS you can choose one of 2 options:
PRIMARY-PRIMARY (introscope.enterprisemanager.failover.primary=Host1,Host2):
After EM1 goes down EM2 will take control and when EM1 starts EM2 will still retain control. If EM2 then goes down EM1 will then take control and again retain it when EM2 restarts.
PRIMARY-SECONDARY (introscope.enterprisemanager.failover.primary=Host1 and introscope.enterprisemanager.failover.secondary=Host2)
After EM1 goes down EM2 will take control but when EM1 restarts EM1 will regain control.