OSPF router failure can create a network outage of up to 1 hour.

book

Article ID: 167835

calendar_today

Updated On:

Products

XOS

Issue/Introduction

In DBHA environments, using OSPF dynamic routing protocol in RSW, an OSPF router failure can create a network outage of up to 1 hour.In DBHA environments using OSPF dynamic routing protocol in RSW, an OSPF router failure can create a network outage up to 1 hour.
The problem happens in the following specific scenario:
  • OSPF is running on the VAP interface IP addresses (circuit IP not VRRP)
  • OSPF routers (VAPs) are acting as an Autonomous System Boundary Router (ASBR) redistributing OSPF external routing information.
  • OSPF routers (VAPs) are redistributing a given AS External route into the OSPF routing domain using exactly the same parameters (Next-hop, metric, metric-type, prefix, etc.).

This is a common scenario for a DBHA configuration running OSPF and redistributing a specific route into the OSPF domain.

User-added image


Cause

Problem:

From an OSPF protocol perspective, the following behavior applies, assuming that the OSPF external routing information is redistributed with the same parameters. 

- Router A and Router B will establish an OSPF adjacency to each other and to the other OSPF routers 
- Based on the Database description and the OSPF LSDB, Router A and Router B will try to generate a functionally equivalent LSA AS external type 5. 
- OSPF protocol RFC 2328 states that, should the situation be encountered, the following rule applies: 
- "If two routers, both reachable from one another, originate functionally equivalent AS-external-LSAs (i.e., same destination, cost and non-zero forwarding address), then the LSA originated by the router having the highest OSPF Router ID is used. The router having the lower OSPF Router ID can then flush its LSA." 

- At that point only one LSA will remain into the LSDB and the one with the highest router-ID will be kept. In the example above, Router B is kept.

When Router B becomes unreachable, (for reasons such as an OSPF process problem, a hardware problem, and so on) the dead interval timer will run for all OSPF routers. 
- Each OSPF router in the OSPF network will mark Router B as unreachable 
-  In this configuration Router A does not age out the AS External LSA (10.1.1.0/24) from Router B. It then prevents Router A to generate a new AS External LSA into the OSPF domain. 
- The Router B AS External LSA will then be kept in the LSDB until the LSA MaxAge timer will expire on this particular LSA. The MaxAge timer is 3600 seconds. 
- This problem causes a network outage in a customer environment because the AS External LSA for network 10.1.1.0/24 cannot be used for the SPF computation as it belongs to an unreachable router. 

Resolution

Upgrade to RSW 7.1 or above, please note that this version requires XOS 8.5+.

Workaround

Configure each router to redistribute the AS External routes into OSPF with a slightly different metric.
With this configuration, the router will generate a different LSA (the metric will be different) and both LSA will be kept in the OSPF LSDB. 

Attachments