frr causing high Edge CPU utilization
search cancel

frr causing high Edge CPU utilization

book

Article ID: 400251

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Edge Transport Nodes are reporting High CPU usage.

  • Alarms are being raised in the NSX-T UI Alarm section:

    The CPU usage on Edge node <UUID> has reached ##% which is at or above the high threshold value of 60%.

  • In the NSX-T UI, navigate to System-> Fabric -> Nodes -> Edge Transport Nodes. Select the impacted Edge TN and go to Monitor. Services CPU is reported high CPU (Alarm are triggered from 60%):


  • In the same page, confirm Datapath CPU is fine. (Under 50%).

  • Confirm the High CPU Usage is due to the frr. Access the edge in root mode and run the top command and then, press n:


  • BGP peers are advertising a prefix which is in the same subnet as the next hops for the static routes.

    Snippet from NSXT_EdgeNode_####/var/log/frr/frr.log:


    Snippet from NSXT_EdgeNode_####/edge/tier0_routing:


  • In the syslog of the edges, we see constant addition and removal of static routes:

    2025-05-28T13:57:12.128Z edge01.example.com NSX 3635 ROUTING [nsx@6876 comp="nsx-edge" subcomp="rcpm" s2comp="rcpm-nsxa" level="INFO"] Received prefix <ip-address/subnet mask> with n_nexthops = 1 in table_id = 254 action = ADD

 

Environment

VMware NSX-T 3.x 

Cause

Continuous addition/removal of BGP routes into RIB for a topology where Tier0 SR on edge has multiple BGP neighbors causes the frr to be overutilized. The BGP neighbors are sending ECMP prefixes to the tier0 SR.

Resolution

The issue has been resolved in VMware NSX version 4.0 onwards.

Workaround:

Add an inbound route-map that filters the BGP prefix which is in the same subnet as the static route next hop.

Additional Information