Resolution for Flapping Network Interface on the ProxySG, which let to Web Access Outage
search cancel

Resolution for Flapping Network Interface on the ProxySG, which let to Web Access Outage

book

Article ID: 224172

calendar_today

Updated On:

Products

ProxySG Software - SGOS

Issue/Introduction

Link flap means that the interface continually goes up and down in a network Switch (e.g., Cisco switch). The interface is put into the errdisabled state if it flaps more than five times in 10 seconds. The common cause of link flap is a Layer 1 issue such as a bad cable, duplex mismatch, or bad Gigabit Interface Converter (GBIC) card.

A collision occurs on your network when something happens to the data sent from the physical network medium that prevents it from reaching its destination. Mainly, it encounters another signal from another host on the network that yields a resulting useless signal on the network when the signals combine.

Environment

All SGOS releases and all ProxySG appliance models.

Cause

There are many reasons the ProxySG interfaces may detect input errors. Some of these causes may be related to bad cables, defective interface on switch/router/sg, duplex mismatch, etc.
To determine the root cause, it is important to try various simple troubleshooting actions such as the following:

  • Swapping cables
  • Swapping ports on switch/router/SG/Firewall and/or other
  • Hard coding duplex and speed settings on both the SG and interconnected devices
  • Swapping devices (e.g. changing the switch out with another)
  • Restarting SG and/or interconnected devices

If the root causes of the link flaps, seen on the network interface(s) on the ProxySG aren't resolved, these would lead to an interface down state on the ProxySG, with the resultant outage in web access. 

Causes of Errdisabled State

This feature was first implemented in order to handle special collision situations in which the switch detected excessive or late collisions on a port. Excessive collisions occur when a frame is dropped because the switch encounters 16 collisions in a row. Late collisions occur after every device on the wire should have recognized that the wire was in use. Possible causes of these types of errors include:

  • A cable that is out of specification (either too long, the wrong type, or defective)
  • A bad network interface card (NIC) card (with physical problems or driver problems)
  • A port duplex misconfiguration

A port duplex misconfiguration is a common cause of the errors because of failures to negotiate the speed and duplex properly between two directly connected devices (for example, a NIC that connects to a switch). Only half-duplex connections should ever have collisions in a LAN. Because of the carrier sense multiple access (CSMA) nature of Ethernet, collisions are normal for half duplex, as long as the collisions do not exceed a small percentage of traffic.

There are various reasons for the interface to go into errdisable. The reason can be:

  • Duplex mismatch
  • Port channel misconfiguration
  • BPDU guard violation
  • UniDirectional Link Detection (UDLD) condition
  • Late-collision detection
  • Link-flap detection
  • Security violation
  • Port Aggregation Protocol (PAgP) flap
  • Layer 2 Tunneling Protocol (L2TP) guard
  • DHCP snooping rate-limit
  • Incorrect GBIC / Small Form-Factor Pluggable (SFP) module or cable
  • Address Resolution Protocol (ARP) inspection
  • Inline power

Note: Error-disable detection is enabled for all of these reasons by default. In order to disable error-disable detection, use the no errdisable detect cause command. The show errdisable detect command displays the error-disable detection status.

Ref. Doc.: https://www.cisco.com/c/en/us/support/docs/lan-switching/spanning-tree-protocol/69980-errdisable-recovery.html

For the non-Cisco switch vendors, please refer to the the vendor for similar guidance and switch commands. 

Resolution

As a key troubleshooting step on the side of the ProxySG, it is recommended to, please, carry out the recommended network diagnostics test on the network interfaces on the ProxySG appliance, to determine if there is any hardware defect on the cards. If any HW failure is received in the diagnostics result, the next step should be to trigger an RMA process, to replace any failing NIC on the ProxySG. This article references a case where there wasn't any failure on the flapping NIC on the ProxySG appliance. All the network information and commands refer to the Cisco IOS platform.

On the network side, because the switchport would have gone into the errdisable state, due to excessive network collisions, if you have enabled errdisable recovery, you can determine the reason for the errdisable status if you issue the show errdisable recovery command (Ref.: Cisco IOS). Here is an example:

cat6knative#show errdisable recovery
ErrDisable Reason    Timer Status
-----------------    --------------
udld                 Enabled
bpduguard            Enabled
security-violatio    Enabled
channel-misconfig    Enabled
pagp-flap            Enabled
dtp-flap             Enabled
link-flap            Enabled
l2ptguard            Enabled
psecure-violation    Enabled
gbic-invalid         Enabled
dhcp-rate-limit      Enabled
mac-limit            Enabled
unicast-flood        Enabled
arp-inspection       Enabled

Timer interval: 300 seconds

Interfaces that will be enabled at the next timeout:

Interface      Errdisable reason      Time left(sec)
---------    ---------------------    --------------
  Fa2/4                bpduguard          273

To correct the Root Problem, refer to the relevant section in the Cisco resource doc. with URL below, to identify to root cause and to recover the switchport.

https://www.cisco.com/c/en/us/support/docs/lan-switching/spanning-tree-protocol/69980-errdisable-recovery.html

Specifically, with the technical case this article references, in addition to recovering the switchport from the errdisabled state, the cabling that connects the flapping WAN-facing interface, on the ProxySG, to the upstream network device was replaced with a standard, recommended cable, to resolve the issue with the flapping network interface on the ProxySG appliance and Web Access was fully restored and consistently so.