frr_show_running_config:
address-family ipv4 unicast
aggregate-address #.#.199.#/24 summary-only
At logs, these FIB updates with #.#.199.#/24 deletion can be seen: 2024-08-13T12:21:55.161Z ####### NSX 17 ROUTING [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="routing" level="INFO"] FIB update 1 delete #.#.199.#/24 delete
2024-08-13T12:21:55.161Z ####### NSX 4645 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="######" tname="######" level="INFO"] Delete lrouter 27407b28-####-####-####-############'s FIB entry for #.#.199.#/24
But during this same time, at least 1 or more /32 prefixes from the same subnet can be seen at the SR-T0 routing table:
b > * #.#.#.65/32 [20/0] via #.#.#.#, uplink-###, 2d10h24m
b > * #.#.#.68/32 [20/0] via #.#.#.#, uplink-###, 2d10h24m
b > * #.#.#.72/32 [20/0] via #.#.#.#, uplink-###, 2d10h24m
<many lines omitted>
So, there are many /32 prefixes from subnet #.#.199.# in the routing table, but still the aggregate #.#.199.#/24 got removed. This should not happen because 'aggregate-address #.#.199.#/24 summary-only'
is configured at T0 gateway. This route removal from FIB also stops the advertisement of /24 aggregate route from NSX Tier-0 Gateway to external BGP neighbor B.
VMware NSX-T Data Center 3.2.x
VMware NSX 4.0.x
VMware NSX 4.1.x
Aggregation route reference count thread yield value for the main queue processing is 50ms.
On route withdrawal, it decrements the aggregate count and set BGP_PATH_REMOVED flag on the path, then the bgp path processing is scheduled.
Meanwhile if Edge receives an update for the path before the withdrawal processing, it would decrement aggregate count again in bgp_update though we restore the path. This leads to dual decrement of aggregate counter, which could lead to this problem.
This issue is resolved in VMware NSX 4.2.1.3.0.24497425
This issue is resolved in VCF 9.0