Problem:
MOM Hot Failover does not work. And it fails with the below error on the Primary EM IntroscopeEnterpriseManager log:
[ERROR] [Smartstor/Superstor Spool] [Manager] Can't write timeslice ######## to spool: java.io.SyncFailedException: sync failed
Environment:
This issue happened on APM 9.7.1.16 but can happen on other versions as the issue is on the OS (UNIX/Linux) side.
Primary and Secondary Introscope installation are on shared disk using a Network Attached Storage (NAS) protocol such as these protocols:
Cause:
The Lock File feature is not turned on the NFS system.
The Smartstor Data directory is installed and configured to run on an NFS shared drive.
While starting the Enterprise Managers on the Primary and Secondary, the "Primary Lock file" is acquired by both MOM instances, which is causing the problem.
Here are the EM Logging detail :
Primary MOM IntroscopeEnterpriseManagerlog:
[INFO] [main] [Manager.HotFailover] The Introscope Enterprise Manager is configured as a Primary EM
[INFO] [main] [Manager.HotFailover] Acquiring secondary lock...
[INFO] [main] [Manager.HotFailover] Acquired secondary lock
[INFO] [main] [Manager.HotFailover] Acquiring primary lock...
[INFO] [main] [Manager.HotFailover] Acquired primary lock
[INFO] [main] [Manager.HotFailover] Released secondary lock
[INFO] [main] [Manager.HotFailover] Proceeding with startup
Secondary MOM IntroscopeEnterpriseManager.log:
[INFO] [main] [Manager.HotFailover] The Introscope Enterprise Manager is configured as a Secondary EM
[INFO] [main] [Manager.HotFailover] Acquiring primary lock...
[INFO] [main] [Manager.HotFailover] Acquired primary lock
[INFO] [main] [Manager.HotFailover] Trying to acquire secondary lock
[INFO] [main] [Manager.HotFailover] Acquired secondary lock
[INFO] [main] [Manager.HotFailover] Released secondary lock
[INFO] [main] [Manager.HotFailover] Proceeding with startup
What's wrong is after Primary MOM acquired the primary lock, the Secondary MOM should be blocked when trying to acquire the Primary lock again.
Instead, the Secondary MOM also acquired the Primary lock and proceeded with startup.
The Primary and Secondary lock are simply two file locks. These two files are "primary_em.lck" and "secondary_em.lck" under <EM_Home>\config\internal\server.
Because the NFS failed to lock the file, both MOM instances acquired Primary lock file and started as Primary MOM.
Resolution:
Enable the Lock File feature on the NFS which is handled by the OS.