Fault tolerant SpectroSERVERs alarming

Products

CA Spectrum DX NetOps

Issue/Introduction

What are the options for optimal alarm clears in fault tolerant SpectroSERVERs?

Events are seen with Precedence 20. Why are the Events raised by the secondary SS? The secondary never took over so Events should all be raised against the primary SS only.

I see Events with the yellow Minor severity color, but no Severity value. The model the Event is raised against shows no Minor severity alarms.

Environment

All supported DX NetOps Spectrum releases

Cause

Normally when this is observed there was a temporary communication issue between OneClick and the primary SpectroSERVER.

Resolution

To resolve this ensure the secondary SpectroSERVER has these entries added to the $SPECROOT/SS/.vnmrc file.

is_secondary=TRUE
wait_active=yes

If they are found missing or set incorrectly, edit the .vnmrc to correct the configuration. Save the changes to the .vnmrc file and restart the SS to read the .vnmrc file changes in.

Note:

The is_secondary should only be set on the secondary SpectroSERVER. It shouldn't be used in a primary SpectroSERVERs .vnmrc configuration.
The wait_active is normally set only on the primary SpectroSERVER but is safe to set on the secondary SpectroSERVER as well.

Additional Information

is_secondary

This setting lets the secondary SpectroSERVER drop events unless DX NetOps Spectrum determines that the secondary SpectroSERVER has taken over as the primary SpectroSERVER.
Should only be set to yes in the .vnmrc file on a secondary SpectroSERVER.

wait_active

Determines whether the server accepts connections as soon as all models are loaded or waits until all models are active.
If set to Yes, a Control Panel message displays a running percentage of models that were activated during SpectroSERVER startup.
The wait_active parameter is set to yes on the primary SS only to avoid missed alarms, but activation may take longer.

Additional details about these settings

Add the following line to the .vnmrc file on the secondary SpectroSERVER to limit the potential for false events or alarms:
- is_secondary = TRUE
When we restart the primary SpectroSERVER, connections are accepted when all models are loaded, but before all models are activated. The models can take some time to activate.
- Because the secondary SpectroSERVER stops polling when the primary SpectroSERVER is restarted, a gap in your network management coverage can result.
- To avoid this situation, edit the .vnmrc file on the primary SpectroSERVER so that the wait_active resource is set to 'yes'.
  - This parameter causes the server to wait until all of the models are activated before accepting any connections
  - The message area in the DX NetOps Spectrum Control Panel also dynamically displays the percentage of models that are activated.
- The SpectroSERVER can appear to take longer to come up. However, when all the models are activated, the SpectroSERVER is ready to manage the network.
  - The reason for this potential delay is after a primary SS restart, if set to no, OneClick users will be swapped back over to the primary SS before model activation is completed.
  - Normally, the secondary SS runs in a Warm Standby state. This means when the secondary SS is started, it goes through model activation but does not start polling the models until it loses contact with the primary SS.
  - Setting secondary_polling to yes on the secondary SS puts the secondary SS in a Hot Standby state. This means after model activation, it is actively polling the models same as the primary SS.
The wait_activate and secondary_polling parameters have nothing to do with the alarm sync process.