search cancel

Why are both the primary and secondary MOM EMs accessing the APM database simultaneously?

book

Article ID: 42469

calendar_today

Updated On:

Products

CA Application Performance Management Agent (APM / Wily / Introscope) INTROSCOPE

Issue/Introduction

 Issue:
 Both primary and secondary MOM EMs are accessing the APM database which is causing deadlock errors in MOM EM logs.

 [ERROR] [QuartzScheduler_Worker-2] [org.quartz.core.JobRunShell] Job DEFAULT.jobDetailBean threw an unhandled Exception:
org.springframework.scheduling.quartz.JobMethodInvocationFailedException: Invocation of method 'pruneData' on target class [class com.wily.apm.model.pruning.DataPruner] failed; nested exception is org.springframework.jdbc.UncategorizedSQLException: CallableStatementCallback; uncategorized SQLException for SQL [{? = call PRUNE_APM_DATA(?, ?)}]; SQL state [72000]; error code [20999]; ORA-20999: An error - -60 Error Msg - ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "APM.PRUNE_APM_DATA", line 71

 Environment:
 This issue occurred with APM 9.7 primary and secondary MOM EMs. This problem may also happen in APM 10.0, 10.1 and 10.2.

 Cause:
 Under certain conditions if observing some type of network issues accessing the EM shared resources (e.g. EM_HOME/config/internal/server/primary_em.lck and/or EM_HOME/config/internal/server/secondary_em.lck),
you can find that secondary MOM is able to acquire primary lock (primary_em.lck), and fully start.

 During the same time you can also find that primary MOM is still up and running.

 

 Resolution:
 After seeing any type of network issues related to accessing EM shared resources, observe the output when starting the secondary MOM. The following messages will be displayed when starting the secondary MOM.


4/20/16 09:00:07.447 AM EDT [INFO] [main] [Manager] Starting Introscope Enterprise Manager...
4/20/16 09:00:07.914 AM EDT [INFO] [main] [Manager.HotFailover] The Introscope Enterprise Manager is configured as a Secondary EM
4/20/16 09:00:07.916 AM EDT [INFO] [main] [Manager.HotFailover] Acquiring primary lock...

At this point secondary MOM goes to wait mode regularly checking the access status of primary_em.lck. The primary MOM start process should not move forward until secondary MOM acquires primary_em.lck.

Note: A typical implementation of MOM failover on multiple hosts shares a single complete EM installation on a high-available (HA) Network Attached Storage (NAS) device. e.g. NFS, SMB. 

It is highly recommended that EM shared resources should be high-available (HA) e.g. shared disk, EM installation location etc.

Environment

Release: CEMUGD00200-9.7-Introscope to CA Application-Performance Management-Upgrade Main
Component: