VMware VeloCloud SD-WAN OSPF MD5 bad sequence cause OSPF flap
search cancel

VMware VeloCloud SD-WAN OSPF MD5 bad sequence cause OSPF flap

book

Article ID: 384575

calendar_today

Updated On:

Products

VMware VeloCloud SD-WAN

Issue/Introduction

Customer may observe intermittent OSPF NSM event indicating OSPF flap, ospfd log indicate OSPF state change Full -> Deleted (InactivityTimer) after 3 consecutive “ospf_check_md5 bad sequence” log (every md5 bad sequence log interval must be 10s which is Hello timer). 

However, packet capture shows SD-WAN edge keeps receving OSPF hello packets from its peer:

Environment

All supported VMware VeloCloud SD-WAN edge versions

Cause

It is because there are out-of-order issue. When OSPF peer sends OSPF Hello and LS update, the order is correct. But when SD-WAN edge receives it, the order may be wrong, consider below example:

2024/12/17 00:17:16 OSPF: [EC 134217739] interface SFP3:10.24.96.103: ospf_check_md5 bad sequence 1734394468 (expect 1734394469)

When sent by OSPF peer, the order is correct, with LS update having seq 1734394468 and OSPF Hello having seq 1734394469

However, upon reception by SD-WAN edge, the order is reversed:

Once this situation occurs, FRR (edge's dynamic routing is implemented by the third-party open-source component FRR) does not consider this OSPF hello as a valid message.

FRR code block about checking crypto seqnum:

   /* check crypto seqnum. */
   nbr = ospf_nbr_lookup_by_routerid (oi->nbrs, &ospfh->router_id);
-  if (nbr && ntohl(nbr->crypt_seqnum) >= ntohl(ospfh->u.crypt.crypt_seqnum))
+  if (nbr && ntohl(nbr->crypt_seqnum) > ntohl(ospfh->u.crypt.crypt_seqnum)) {
+    zlog_warn ("interface %s: ospf_check_md5 bad sequence %d (expect %d)",
+	       IF_NAME (oi),
+	       ntohl(ospfh->u.crypt.crypt_seqnum),
+	       ntohl(nbr->crypt_seqnum));
     return 0;
+  }

 

If out-of-order occurrences happen continuously 4 times, FRR will disconnect OSPF.

Resolution

Out-of-order occurrences may be due to the distance between the two OSPF neighbors, causing out-of-order delivery by intermediary devices. Given the above scenario, the following modifications can be attempted to mitigate such issues:

  • Disable OSPF MD5, so that no md5 seq is generated, and FRR will no longer check. This change will require modifying the OSPF configuration of all devices within the network segment.
  • Set the dead timer to 120 seconds, so that OSPF will only be disconnected after 12 consecutive out-of-order occurrences, with the probability of 4 consecutive out-of-order occurrences being 1/256.