Spectrum AlarmNotifier SANM Fault Tolerant
search cancel

Spectrum AlarmNotifier SANM Fault Tolerant

book

Article ID: 99723

calendar_today

Updated On:

Products

CA Spectrum DX NetOps

Issue/Introduction

AlarmNotifier is not configured to be Fault Tolerant out of the box. However you can make some adjustments to the Set, Clear and Update scripts on both Primary and Secondary servers to allow AlarmNotifier to work in a fault tolerant environment. 

What this will do : 

Whenever an alarm is generated , the model in the DB is checked and the attribute 0x12c0a (precedence) is read.  If the value is 10 (Primary server precedence) then Primary Alarm Notifier sends it and Secondary will write a line to the NOTIFIER.OUT log file saying Primary is running.

If 0x12c0a is 20 then Secondary server AlarmNotifier will send the mail and the Primary would write to the NOTIFIER.OUT file saying Secondary is running.

Environment

Release: Any
Component: SPCAEM

Resolution

1. First step to to make sure both Primary and Secondary AlarmNotifiers are configured the same. If you have not yet set up AlarmNotifier, then please see this Knowledge Article 21373 "How to configure Spectrum AlarmNotifier to send email notifications"

2. Next step is to add a new attribute to the .alarmrc:

Every alarm has an attribute with the associated precedence. On a primary SpectroSERVER this value is normally "10", on the Secondary SS it is typically "20". Add this alarm attribute into the .alarmrc of the AlarmNotifier of both Primary and Secondary servers, so that this attribute will forwarded to the scripts: 

EXTRA_ATTRS_AS_ENVVARS=0x12c0a 

Step 3 is optional - this will just output the precedence in the alarm:

3. Then, add these lines to the Set, Update, and Clear scripts for SANM on both Primary and Secondary. However, be sure to add the proper Primary code to the Primary scripts, and vice versa.

On Primary add this code: 

if [[ "$SANM_0X12C0A" = "20" ]] 
then 
echo "SS Secondary is running" 
echo "Precedence = $SANM_0X12C0A" 
exit 0 
fi 


On Secondary add this code:

if [[ "$SANM_0X12C0A" = "10" ]] 
then 
echo "SS Primary is running" 
echo "Precedence = $SANM_0X12C0A" 
exit 0 
fi 


NOTE: On Windows machines, the Attribute code is all UPPERCASE: $SANM_0X12C0A. However on LINUX machines this MUST be LOWERCASE: $SANM_0x12c0a. Thus in Linux:

if [[ "$SANM_0x12c0a" = "20" ]] 
then 
echo "SS Secondary is running" 
echo "Precedence = $SANM_0x12c0a" 
exit 0 
fi 


4. Prevent Duplicate emails:

When you start receiving alarms via email, you may start to see duplicate emails. This is because AlarmNotifier has no built-in check to verify which server, Primary or Secondary, is running. In addition, AlarmNotifier running on each server connects to its respective Archive Manager. If your Secondary is configure for a warm or hot standby, its Archive Manager may be running. This is normal but it will cause duplicate emails to be sent. 

To work around this, you'll want to add another if/then statement to check Precedence. This gets added right before the $MAIL line in the scripts so the ability to send an email becomes part of the Precedence check. This will allow only emails which match the current active server to be sent. 

Thus, on EACH script, both Primary and Secondary, add these lines. However, the below example is for PRIMARY. Make sure to change the precedence to "20" when adding these lines to the Secondary's scripts. This section in the script is near the bottom. Add the new lines right after the "echo_info | tee -i /tmp/set_alarm.$PID" line, which is right above the "$MAIL" line. You'll also add a closing "fi" to close the new if/then statement near the bottom, otherwise it will error. 

Be sure to be aware of case sensitivity across platforms as mentioned above. $SANM_0x12c0a = Linux, $SANM_0X12C0A = Windows

if [ "$RCVRS" -a "$RCVRS" != " " ] 
then 
echo " " 
echo "*******************************************************************" 
echo "Sending mail to $RECIPIENTS:" 
echo "" 
echo "($RCVRS)" 
echo "*******************************************************************" 
echo_info | tee -i /tmp/set_alarm.$PID 

if [ "$SANM_0X12C0A" = "10" ] 
then 
$MAIL -s "A $SEV alarm has occurred on $SERVER (Model Name=$MNAME)(Model Type=$MTYPE)" $RCVRS < /tmp/set_alarm.$PID 
rm -f /tmp/set_alarm.$PID 
else 
echo " " 
echo "*****************************************************" 
echo "NO $RECIPIENTS assigned - no mail sent" 
echo "*****************************************************" 
echo_info 
fi 

fi
else 
echo_info 
fi 


Next, you can also add echo statements to indicate in the email which server is active. In the example below this would be added to the Secondary scripts only. If this is to be added to Primary scripts, be sure to change the text to "Primary is running". 

echo "Severity: " $SEV 
echo "ProbableCauseID: " $CAUSE 
echo "RepairPerson: " $REPAIRPERSON 
echo "AlarmStatus: $STATUS" 
echo "SpectroSERVER: " $SERVER 
echo "Landscape: " $LANDSCAPE 

echo "SS Secondary is running" 
echo "Precedence = $SANM_0X12C0A" 
echo "ModelHandle: " $MHANDLE 
echo "ModelTypeHandle: " $MTHANDLE 
echo "IPAddress: " $IPADDRESS 
echo "SecurityString: " $SECSTR 
echo "AlarmState: " $ALARMSTATE 
echo "Acknowledged: " $ACKD 
echo "UserClearable: " $CLEARABLE 


5. Lastly, Save the files and restart both Primary and Secondary Alarm Notifiers.

Additional Information


In the SetScript one option could be

if [[ "$SANM_0X12C0A" = "10" ]];then
$MAIL -s "A $SEV alarm has occurred on $SERVER (Model Name=$MNAME)(Model Type=$MTYPE)" $RCVRS < /tmp/set_alarm.$PID
else
echo "Alarm ${GLOBAL_ALARM_ID} was generated by the Secondary (${SANM_0X12C0A}) skipping...."
fi

In the Secondary AlarmNotifier's SetScript change the "10" -TO- "20" and in the else 'Secondary' to 'Primary' this way the notifier will
log that it was skipped specifically as it was generated by the other server (Helpful for troubleshooting).