/24 route that represents an aggregation route disappears from T0 routing table, while the small subnets are still present on the routing table.
search cancel

/24 route that represents an aggregation route disappears from T0 routing table, while the small subnets are still present on the routing table.

book

Article ID: 389528

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • The environment has similar topology to below:
    • NSX Tier-0 Gateway has BGP configured with multiple external BGP neighbors.
    • BGP prefixes received from external BGP neighbor A are being re-advertised by NSX Tier-0 Gateway to external BGP neighbor B.
    • External BGP neighbor A advertises several /32 prefixes from a single supernet #.#.199.#/24 to NSX Tier-0 Gateway.
    • NSX Tier-0 Gateway has Route Aggregation configured for this supernet #.#.199.#/24:

frr_show_running_config:
address-family ipv4 unicast
       aggregate-address #.#.199.#/24 summary-only

    • Towards the other External BGP neighbor B, NSX Tier-0 Gateway advertises all the /32 prefixes from #.#.199.# subnet as well as the route aggregation itself #.#.199.#/24


  • At some point, the /24 route that represents the aggregation disappears from T0 routing table, while the small subnets are still present on the routing table.

At logs, these FIB updates with #.#.199.#/24 deletion can be seen: 
2024-08-13T12:21:55.161Z ####### NSX 17 ROUTING [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="routing" level="INFO"] FIB update 1 delete #.#.199.#/24 delete

2024-08-13T12:21:55.161Z ####### NSX 4645 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="######" tname="######" level="INFO"] Delete lrouter 27407b28-####-####-####-############'s FIB entry for #.#.199.#/24

But during this same time, at least 1 or more /32 prefixes from the same subnet can be seen at the SR-T0 routing table:

b  > * #.#.#.65/32 [20/0] via #.#.#.#, uplink-###, 2d10h24m
b  > * #.#.#.68/32 [20/0] via #.#.#.#, uplink-###, 2d10h24m
b  > * #.#.#.72/32 [20/0] via #.#.#.#, uplink-###, 2d10h24m
<many lines omitted>

So, there are many /32 prefixes from subnet #.#.199.# in the routing table, but still the aggregate #.#.199.#/24 got removed. This should not happen because 'aggregate-address #.#.199.#/24 summary-only' is configured at T0 gateway. This route removal from FIB also stops the advertisement of /24 aggregate route from NSX Tier-0 Gateway to external BGP neighbor B.

  • This issue has a higher chance of happening if the External BGP neighbor A is virtual Calico router.

Environment

VMware NSX-T Data Center 3.2.x
VMware NSX 4.0.x
VMware NSX 4.1.x

Cause

Aggregation route reference count thread yield value for the main queue processing is 50ms.
On route withdrawal, it decrements the aggregate count and set BGP_PATH_REMOVED flag on the path, then the bgp path processing is scheduled.
Meanwhile if Edge receives an update for the path before the withdrawal processing, it would decrement aggregate count again in bgp_update though we restore the path. This leads to dual decrement of aggregate counter, which could lead to this problem.

Resolution

This issue is resolved in VMware NSX 4.2.1.3.0.24497425

This issue is resolved in VCF 9.0