What are the steps to implement failover and High Availability (HA) MOMs?
search cancel

What are the steps to implement failover and High Availability (HA) MOMs?


Article ID: 10226


Updated On:


CA Application Performance Management Agent (APM / Wily / Introscope) INTROSCOPE


This document explains how to configure APM 10.x with 2 highly available (HA) MOMs. The procedure is the same for Windows and Linux, although you will have to substitute various OS commands for Linux since the focus here is on Windows.

A fully implemented failover on multiple hosts typically shares a single complete EM installation on a HA Network Attached Storage (NAS) e.g. NFS, SMB.

The key component of the failover process is the lock files (.lck) which are located in directory EM_HOME/config/internal/server:

  • When the first EM (EM1 on Host1) starts it acquires an exclusive lock on file primary_em.lck (using Java API method FileChannel.lock)
  • When the second EM (EM2 on Host2) starts it :
    • Acquires an exclusive lock on file secondary_em.lck
    • Runs in a wait mode regularly checking the access status of primary_em.lck
  • When EM1 goes down EM2:
    • Immediately acquires the exclusive lock on primary_em.lck
    • Relinquishes the lock on secondary_em.lck
    • Completes its start-up.


Introscope 10.x


Configure the primary MOM 

1) Install the primary MOM as you normally would on the server’s local hard drive. Skip this step if you already have an existing installation. 

2) Check all log files to make sure there are no errors and the EM service is running. 

3) Stop the EM service. 

4) Copy the following directories from the APM installation directory (<EM_HOME>) to a shared network location that can be accessed by both MOM's (should be high performance SAN or NAS storage).

      a. config 

      b. data 

      c. threaddumps (might not exist if a threaddump never happened). 

      d. traces 

      e. cem 

      f. scripts 

      g. ext 

      h. ws-plugins 

      i. webapps

5) Delete the above directories from the local <EM_HOME> directory. They will now only live on the  shared network storage. 

6) Create symlinks in the local <EM_HOME> directory that point to the directories on the shared network storage. 

      a. Open an elevated command prompt (right-click CMD, run as administrator). 

      b. CD to the local <EM_HOME> directory.

     Create symlinks.  NOTE: Using a mapped drive letter will NOT work.  You must symlink directly to a UNC path.  Make sure to enclose the path in quotation marks if there are spaces.  The MKLINK command looks like this: MKLINK /D Link Target

            i. MKLINK /D config \\server\share\config 

            ii. MKLINK /D data \\server\share\data 

            iii. MKLINK /D traces \\server\share\traces 

            iv. MKLINK /D cem \\server\share\cem 

            v. MKLINK /D scripts \\server\share\scripts 

            vi. MKLINK /D ext \\server\share\ext 

            vii. MKLINK /D ws-plugins \\server\share\ws-plugins 

            viii. MKLINK /D webapps \\server\share\webapps 

7) Edit IntroscopeEnterpriseManager.properties as follows: 

      a. introscope.enterprisemanager.failover.enable=true 

      b. introscope.enterprisemanager.failover.primary=x.x.x.x (primary MOM name or IP address) 

      c. introscope.enterprisemanager.failover.secondary=x.x.x.x (secondary MOM name or IP address)


Configure the secondary MOM 

The configuration of the secondary MOM is just like the primary MOM but we don’t have to copy the directories  to the shared location since they’re already there. 

      1) Stop the EM service.

      2) Backup the 8 directories we discussed when configuring the primary MOM. 

      3) Delete the 8 directories from the local <EM_HOME> directory. 

      4) Follow the directions above to create 8 symlinks in the <EM_HOME> directory. 

At this point the EM service on both MOM's is stopped, the MOM's are both pointing to the same 9 remote directories, and the MOM's are configured as a highly available pair.


Test Failover (primary to secondary) 

Both MOMs should be stopped at this point. 

      1) Start the service on the primary MOM. 

      2) Look in IntroscopeEnterpriseManager.log for lines starting with Manager.HotFailover.  You want to see 3 messages that say acquired primary lock, released secondary lock, proceeding with startup.

      3) Wait for the EM to completely start. 

      4) Start the service on the secondary MOM. 

      5) Secondary MOM log stops at “Acquiring primary lock” and waits forever for the primary MOM to go  down. 

      6) Stop the service on the primary MOM. 

      7) Wait for a message on the secondary MOM that says “Acquired primary lock.” 

      8) At this point the secondary MOM has taken over primary duties.


Test Failback (secondary to primary) 

      1) Start the service on the primary MOM. 

      2) Wait for a log message that says acquired primary lock. 

      3) Secondary MOM log file should say “orderly shutdown complete” and the service should stop. This is by design. 

      4) Manually restart service on secondary MOM. 

      5) Log file should again stop at “Acquiring primary lock.”


WebView Configuration 

WebView only allows you to point to one MOM in its config file.  To make WebView failover and failback to and from a secondary MOM when the primary MOM goes down requires a change in DNS.  You will need to create 2 identically named 'A' records in DNS, each with the IP address of one of the MOM's.  So for example 2 'A' records both named 'webviewlogical'

webviewlogical   A  (ip of primary mom) 

webviewlogical   A  (ip of secondary mom) 

Now use “webviewlogical” as the name of the MOM in the WebView config file. 

Additional Information

For further information please check:

https://communities.ca.com/people/JMertin/blog/2017/02/07/how-to-configure-mom-fail-over  -- How to configure MoM fail-over

https://communities.ca.com/docs/DOC-231168340  -- APM 10.x with HA MOMs.pdf

https://www.ca.com/us/services-support/ca-support/ca-support-online/knowledge-base-articles.tec1282305.html  -- In a MOM failover configuration, especially supported Windows platforms, how must the filesystem links to shared directories be established?