VMware SD-WAN Edge with excessive BGP match/set rules or filters may experience multiple Edge service restarts.
search cancel

VMware SD-WAN Edge with excessive BGP match/set rules or filters may experience multiple Edge service restarts.

book

Article ID: 312375

calendar_today

Updated On:

Products

VMware SD-WAN by VeloCloud

Issue/Introduction

  • This issue generally affects VMware SD-WAN sites with greater than 512 BGPv4 or match/set rules in total (under all filters). 
  • This issue could occur on VMware SD-WAN site with total BGP match/set rules significantly lower than 512, if that site has high memory/CPU utilization. 
  • This issue affects any 4.x.x & 5.x.x software version except following,
    • 4.2.2 (Build R422-20220419-GA or later)
    • 4.5.1 (Build R451-20220701-GA or later) 
    • 5.0.1 (All builds)

Symptoms:

  • Generally, this issue is seen when VMware SD-WAN site with BGP is restarted, rebooted, upgraded or downgraded. 
  • A standalone VMware SD-WAN site may encounter repetitive Edge service restarts.
  • If a VMware SDWAN site is deployed in High Availability then it may experience repetitive failovers as a result of the Edge service restarts. 

 

Environment

VMware SD-WAN by VeloCloud

Cause

When a VMware SD-WAN site (standalone or High Availability) has either excessive BGPv4 match/set rules configured in total, or any number of BGPv4 match/set rules configured on an Edge with a high level of CPU and Memory utilization, there is the potential for the Edge's dataplane service to fail multiple times (resulting into repetitive failovers on High Availability sites). The Edge logs will have mutex mon events pointing to the failure.

Resolution

To remediate this issue, SD-WAN site's Edge or HA Edges need to be upgraded to one of the following software versions (in which this issue is fixed):
  • 4.2.2 (Build R422-20220419-GA or later) 
  • 4.5.1 (Build R451-20220701-GA or later)
  • 5.0.1 (All builds)


Workaround:
One of the following workarounds can be used to recover from the issue. 
  • Reduce the total number of BGPv4 match /set rules across all filters. 
  • Remove the BGP configurations and apply them in smaller sections in an incremental manner once the Edge is stable (Note: An SD-WAN site may run into this issue again if it is restarted or upgraded as that will trigger a fresh bulk update of the BGP filters).
  • Alternatively, BGP neighbor-ship can be temporarily brought down from a peer device until the VMware SD-WAN site is successfully upgraded. By doing this, the Edge can avoid contention caused by the bulk update of a BGP configuration. 


Additional Information

  • QE testing for the maximum number of BGPv4 match/set rules is done for 256 BGPv4 match/set rules for both the inbound and outbound filters. So the total number of BGPv4 match/set rules tested is 512. 
  • However, this testing is done on a VMware SD-WAN Edge with a low CPU/memory utilization. Hence, this issue could occur on a site with BGP match/set rules much lower than 512, if that site is also under high CPU/memory utilization.
  • As a result, VMware cannot quantify the exact number of BGP match/set rules that can trigger this issue, as it depends on multiple system properties like the Edge's CPU and Memory utilization which are in part dependent on an Edge model's hardware specifications compared to the scale and complexity of the site.

Impact/Risks:

  • On a VMware SD-WAN Edge standalone site, customer traffic will be impacted due to the repeated Edge service restarts resulting from the dataplane service failures.  Each Edge service restart results in a ~15 second traffic disruption, so multiple such restarts can disrupt traffic for a minute or more.
  • On a VMware SD-WAN site using High Availability, customer's traffic will be impacted briefly due to repetitive failovers caused by the Edge service restarts. This impact would be greater if the site is using Enhanced High Availability where the Standby Edge also passes traffic.