We are running a Spectrum Fault Tolerant (FT) configuration with a primary and secondary SpectroSERVER (SS).
We receive an alarm on a Service model. While this alarm exists, the primary SS process is shutdown and OneClick fails over to the secondary SS. We see this same alarm after failover to the secondary SS.
The primary SS is restarted and OneClick fails back over to the primary SS.
When this happens, the previous alarm on the Service is cleared and a new alarm is asserted on the Service for the same issue.
Release : Any
Component : Spectrum Fault Tolerant Alarms
The current design of the Fault Tolerant Alarm Group re-evaluator only considers the Contact_lost alarms and doesn't consider Service / process (RFC2790) alarms etc.
To give a background, whenever the primary fails back, it contacts the secondary and takes a diff of alarms from the secondary and copies them to the primary.
These alarms are marked as stale in the first step and get re-evaluated by Spectrum Intelligence (FTAlarmGroupReeval).
For the below cause codes, Spectrum alarms are converted to Non-stale as active alarms.
0x1040a (LINK_BAD)
0x10d00 (INFER_CONN_CONTACT_STATUS_LOST)
0x10f70 (DEV_MODULE_FAILED)
0x10f86 (DEV_MODULE_OFFLINE)
0x10f6b (DEV_MODULE_PULLED)
0x10f87 (MODULE_OFFLINE)
0x10f6d (MODULE_PULLED)
For other types (Service alarms for example) they get cleared and regenerated if the issue persists.
This is functioning as designed.
Please reference the "SpectroSERVER Alarm Synchronization" section of the documentation for more information.