How to monitor and send an alert when an interface port is flapping?

Products

Unified Infrastructure Management for Mainframe DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

How to monitor port flapping (linkUpDown status) for network devices.

Resolution

1. Consider reading/reviewing ways of preventing port flapping, e.g.,:

Configure the Link Flap Prevention Settings on a Switch through the CLI

2. The correct probe(s) to use would be either sysloggtw or snmptd for trap monitoring. Customers can review both probes and decide which probe makes more sense for monitoring port flapping in their network environment.

a. sysloggtw

The sysloggtw acts as a gateway from the Syslog "world" into Nimsoft. Most network-devices, such as routers, switches, bridges and so on, reports events using SNMP as well as using the well-known syslog format. The sysloggtw will listen to port 514/udp when running in a receive mode. All incoming syslog messages will be acted upon using the defined receive mode.

You may combine the sysyloggtw with logmon to post-process incoming syslog messages. Some devices e.g. Cisco routers may add an index to each message. Use logmon to reformat the text and severity levels instead of having sysloggtw determining the alarm level according to the syslog priority.

You need to know the log entry your looking for when the interface is flapping. A device admin should be able to examine the syslog on a device where the interface/port/link is flapping and tell you.

b. snmptd (Nimsoft SNMP-TRAP Daemon)

The snmptd acts as a gateway from the SNMP "world" into Nimsoft. Most network-devices, such as routers, switches, bridges and so on, are SNMP-driven. The devices will report error-conditions as SNMP-TRAPS, normally sent to a directed udp-port (162) somewhere in the network (usually to a management station such as HP's OpenView Network Node Manager or equivalent).The snmptd will listen to port 162 (default, but is re-configurable), and convert the incoming traps according to profiles. (linkUpDown notifications)

3. For sending traps, the device needs snmp and traps enabled.

Cisco IOS SNMP Traps Supported and How to Configure
Configure Supported Cisco IOS SNMP Traps

How to Support and Configure Cisco Catalyst OS SNMP Traps

Cisco Catalyst Switch example:

How Do I Enable Traps on Individual Ports, Such as linkUp/linkDown?
Issue the set port trap command in order to enable or disable the operation of the standard SNMP link trap for a port or range of ports. By default, all port traps are disabled.

Note: The Network Analysis Module (NAM) does not support this command.

Syntax
set port trap mod/port {enable | disable}

Syntax Description
mod/port- Number of the module and the port on the module.

enable - Keyword to activate the SNMP link trap.

disable - Keyword to deactivate the SNMP link trap.

If you enable the traps, the corresponding traps that generate are linkUp (.X.X.X.X.X.X.XX.X.X) and linkDown (.X.X.X.X.X.X.XX.X.X). These traps are from the IF-MIB.

Example
This example shows how to enable the SNMP link trap for module 1, port 2:

Console> (enable) set port trap 1/2 enable
Port 1/2 up/down trap enabled.
Console> (enable)

Additional Information

The standard linkUp/Down traps defined in the IF-MIB only send three varbinds by default:

linkDown NOTIFICATION-TYPE
OBJECTS { ifIndex, ifAdminStatus, ifOperStatus }
STATUS current
DESCRIPTION
"A linkDown trap signifies that the SNMP entity, acting in
an agent role, has detected that the ifOperStatus object for
one of its communication links is about to enter the down
state from some other state (but not from the notPresent
state). This other state is indicated by the included value
of ifOperStatus."
::= { snmpTraps 3 }

linkUp NOTIFICATION-TYPE
OBJECTS { ifIndex, ifAdminStatus, ifOperStatus }
STATUS current
DESCRIPTION
"A linkUp trap signifies that the SNMP entity, acting in an
agent role, has detected that the ifOperStatus object for
one of its communication links left the down state and
transitioned into some other state (but not into the
notPresent state). This other state is indicated by the
included value of ifOperStatus."
::= { snmpTraps 4 }

Using the snmptd method, you would have to use nas AO triggers to capture the alarms (up and down) and then run a LUA script on a determined interval/schedule to evaluate them.

With snmptd, the trap for link up and down are two separate traps OIDs, therefore, you would create two separate alarms. It would need a LUA script to evaluate if there are down alarms that correspond to the up alarm. The quantity of each would also factor in.

SPECTRUM

Spectrum may be a better option if that is part of your licensing agreement.

Spectrum allows you to generate an alarm when the interface goes up and down frequently.

In that case, Spectrum generates a lot of bad link alarms but it is cleared when the interface goes up, and if that is too fast the operator usually doesnt see or wont care about the alarms.

The BAD LINK DETECTED alarm can be generated in one of four ways:

1. Live Pipes/Links enabled
2. PollPortStatus enabled
3. Port Fault Correlation
4. Link Down trap

Live Pipes and PollPortStatus rely on proactive polling of the interface. How often Spectrum polls is controlled by the Polling_Interval attribute id 0x10071 configured on the interface model. By default it is 300 seconds (5 minutes).

Port Fault Correlation is controlled by Spectrum Fault Isolation/Correlation intelligence.

A port can be flapping but the above functionality may not detect it because it occurs between polls or does not cause the Fault Isolation/Correlation intelligence to kick in.

The Link Down/Up traps can detect a flapping interface even if the BAD LINK DETECTED alarm is not asserted because it is based on the NUMBER of traps received within a specified amount of time.

Spectrum is already configured out of the box to detect a flapping interface based on the Link Down trap. This is defined in the $SPECROOT/SS/CsVendor/IETF/EventDisp file as follows:

0x00XXXXXX E 30 R { 1 } CA.EventSequence, \
"0x00XXXXXX 1:1,3-6:3-6", 60, "0x00XXXXXX "
# The following rule is to detect flapping interfaces
# When customizing it please be aware that events 0xXXXXXX and
# 0xXXXXXX should still be generated in appropriate pairs, as these
# are used internally in determining if device configuration should
# happen on link-up traps or not
# When we see at least 15 flaps in 5 mins, generate excessive flap
# event 0xXXXXXX. The excessive flap clear event 0xXXXXXX will
# afterwards be generated when not a single flap has been seen anymore
# for at least 10 minutes
0x00XXXXXX E 50 R CA.InterfaceFlapRule, 15, 300, 600, "0x00XXXXXX -:-", "0x00XXXXXX -:-"

The above can be modified to suit individual needs.