How does CA Spectrum determine a trap storm has been detected?

Products

Spectrum Network Observability

Issue/Introduction

We are seeing "A TRAP STORM HAS BEEN DETECTED" alarm frequently generated on some devices.

How does Spectrum determine a trap storm has been detected?

How do you configure the Spectrum trap storm settings?

How do you allow for more traps than the default configuration before trap storm functionality is triggered?

Environment

All supported Network Observability DX NetOps Spectrum releases

Cause

The CA Spectrum Trap Management Subview section of the VNM Attributes in the Information Tab documentation topic explains how Spectrum Trap Storm detection works.

It does not explain the underlying logic used by the code to determine if it should, or should not, trigger the trap storm functionality.

Resolution

We can enable trap storm detection at the SpectroSERVER level, or at the level of a specific modeled device.

When enabled at the SpectroSERVER level the configuration is applied to all models known to that SpectroSERVER.
When enabled at the device level the configuration is applied to only that specific model.

Enable trap storm detection at either level by configuring the following attributes which control this functionality.

Attribute Name: traps_per_sec_storm_threshold
Attribute ID: 0x122db
Attribute Definition: Defines the rate at which traps are received per second from a managed or unmanaged device. When this rate is sustained for the amount of time that is specified by the TrapStormLength attribute, the SpectroSERVER stops the processing of traps from that unmanaged or managed device.
Default Value: 20 traps per second

Attribute Name: TrapStormLength
Attribute ID: 0x122da
Attribute Definition: Defines the time in seconds for which the traps_per_sec_storm_threshold attribute value is sustained. SpectroSERVER considers it a trap storm and disables the processing of traps from that unmanaged or managed device.
Default Value: 5 seconds

The attributes are available under the Attributes Tab in the Component detail pane for the selected VNM model, or for a selected device model.

When trap storm detection is enabled, using default configurations, when Spectrum receives >= 100 traps over a 5 second period it will trigger the functionality.

When traps received from any device reach the configured thresholds, the SpectroSERVER identifies this rate as a trap storm. The SpectroSERVER stops handling traps from that device and traps from other devices are not blocked. SpectroSERVER trap storm detection logic is based on each IP address of an unmanaged or a managed device (trap source) that sends traps to SpectroSERVER. As a result, you can configure each device to send traps to the SpectroSERVER at the appropriate rate."

An important point to keep in mind is the word "rate" from the above details. The underlying formula Spectrum uses to determine if there is a trap storm is as follows:

in_storm = ( sum/TrapStormLength >= trap_storm_size ) ? TRUE : FALSE;

The "sum" is the number of traps received over a time period. Using the above formula above and the default values for traps_per_sec_storm_threshold and TrapStormLength, if the device received 100 (sum) traps in 3 seconds, the calculation would be as follows:

100/5 >= 20

In the above scenario, even though the sample of traps was received over a 3 second period, according to the formula used, the average number of traps is equal to or exceeds 20 traps per second over a 5 second period so Spectrum will detect a trap storm, assert an alarm and stop processing traps for that device until the rate falls below the configured parameters.

When 100 or more traps arrive in the specified time frame, it can be any amount of traps per second over the TrapStormLength time frame. Using the default 5 seconds we can break the TrapStormLength down like this.

Second1 . . . . Second2 . . . . Second3 . . . . Second4 . . . . Second5

If we see this series of 100 traps in that 5 second period, the functionality will be triggered. This would raise the functionality.

1 . . . . 96 . . . . 1 . . . . 1 . . . . 1

This would also trigger the functionality.

54 . . . . 14 . . . . 17 . . . . 10 . . . . 5

This would NOT trigger the functionality due to 99 traps coming in over 4 seconds within the 5 second rolling window.

96 . . . . 1 . . . . 1 . . . . 1 . . . . 0

Additional Information

If the feature is enabled, if the environment has devices sending trap counts greater than the default configurations, adjust the attributes to ensure trap storm detection does not limit the ability to receive traps.