When more than 256 next hops are configured for a single route, static daemon within FRR crashes and static routes are lost

Products

VMware NSX

Issue/Introduction

After upgrading from to NSX-T 3.2.x, user-configured static routes and directly connected routes (which are considered static) are missing from T1 / T0 Service Router routing tables.

edge> get logical-routers
edge> vrf <vrf # of Service Router>
edge(tier0_sr[vrf#])> get route <--- static routes are missing from output
FRR status command shows the static daemon is stopped.

root@edge:/var/log/frr# /usr/lib/frr/frrinit.sh status
/usr/lib/frr/frrcommon.sh: line 304: declare: watchfrr_options: not found
* Status of watchfrr: running
* Status of zebra: running
* Status of bgpd: running
* Status of ospfd: running
* Status of pimd: running
* Status of staticd: FAILED

Example logging in /var/log/frr/frr.log on Edge when static daemon is shut down:

<timestamp> ZEBRA: zebra_rnh_remove_from_routing_table: 0:100.64.#.#/32 removed from tracking on 0.0.0.0/0
<timestamp> ZEBRA: zebra_rnh_store_in_routing_table: 0:100.64.#.#/32 added for tracking on 100.64.#.#/31
<timestamp> ZEBRA: [EC 100663299] stream_read_try: read failed on fd 49: Connection reset by peer
<timestamp> ZEBRA: connection closed socket [49]
<timestamp> ZEBRA: [EC 4043309117] Client 'static' encountered an error and is shutting down.
<timestamp> ZEBRA: Closing client 'static'
<timestamp> ZEBRA: release_daemon_table_chunks: Released 0 table chunks
<timestamp> ZEBRA: release_daemon_label_chunks: Released 0 label chunks
<timestamp> ZEBRA: zebra_rnh_remove_from_routing_table: 0:10.#.#.#/32 removed from tracking on 10.#.#.#/27
<timestamp> ZEBRA: zebra_rnh_remove_from_routing_table: 0:10.#.#.#/32 removed from tracking on 10.#.#.#/29
…
…
<timestamp> ZEBRA: zebra_rnh_remove_from_routing_table: 0:100.64.#.#/32 removed from tracking on 100.64.#.#/31
<timestamp> ZEBRA: zebra_rnh_remove_from_routing_table: 0:100.64.#.#/32 removed from tracking on 100.64.#.#/31
<timestamp> ZEBRA: client 49 disconnected 226 static routes removed from the rib
<timestamp> ZEBRA: zserv_client_free: Deleting client static
More than 256 routes have been configured for a single static (or connected) route.

Example logging in /var/log/frr/frr.log on Edge:

<timestamp> STATIC: [EC 100663301] zapi_route_encode: prefix 192.168.#.#/##: can't encode 257 nexthops (maximum is 256)
<timestamp> STATIC: [EC 100663301] zapi_route_encode: prefix 192.168.#.#/##: can't encode 257 nexthops (maximum is 256)

Example logging in /var/log/syslog on Edge:

<timestamp> <Edge FQDN> staticd 26054 - - [EC 100663301] zapi_route_encode: prefix 192.168.#.#/##: can't encode 258 nexthops (maximum is 256)
<timestamp> <Edge FQDN> staticd 22511 - - [EC 100663301] zapi_route_encode: prefix 192.168.#.#/##: can't encode 258 nexthops (maximum is 256)
<timestamp> <Edge FQDN> staticd 15765 - - [EC 100663301] zapi_route_encode: prefix 192.168.#.#/##: can't encode 257 nexthops (maximum is 256)
In some cases, the IP subnet mentioned in the errors isn’t being used in any segments or set as a static route in Tier-1. Instead, it matches the IP address of the DHCP server set in the DHCP profile, like 192.168.x.1/24. This DHCP profile is attached to Tier-1 to provide a Gateway (centralized) DHCP server for the segments connected to Tier-1. However, these segments don’t use the same subnet as the DHCP server (192.168.x.0/24), but their own, such as 100.10.x.x/26.

If multiple Tier-1s are created using the same DHCP server, the DHCP subnet (e.g., 192.168.x.0/24) will be advertised to Tier-0 and all the segments connected to the Tier-1s. This can quickly exceed the 256-subnet limit.

Environment

VMware NSX-T Datacenter

Cause

FRR which manages routing with NSX Edges can only handle 256 equal cost next hops for a single route.

The DHCP server subnet is also advertised to Tier-0. When there are many Tier-1s using the same DHCP server, the route advertisement for this subnet can quickly exceed the limit.

Resolution

This issue is resolved in VMware NSX-T 3.2.4 and VMware NSX 4.2.0. Later NSX version staticd service will not create new route that will exceed the next hop count of 256 and NSX will throw an alarm.

Workaround:
To remediate this issue, customer should check if there are any Tier-1s/Segments that are advertising routes to the subnets showing up in the logs. Remove or reduce the redundant advertisement of the routes.

If this is related to DHCP server subnet, create a route advertisement rule in each Tier-1 uses the DHCP server to exclude the DHCP subnet from route advertisement. Below is an example:

Alternatively, an API call can be used to create such rule and attach to a Tier-1 gateway. Example:

PATCH https://<NSX-Mgr>/policy/api/v1/infra/tier-1s/{tier-1-name}
{
	"route_advertisement_rules": [
    	{
        	"name": "No_Advertisement_DHCP_Subnets",
        	"subnets": ["100.96.0.0/30"],
        	"route_advertisement_types": ["TIER1_CONNECTED"],
        	"prefix_operator": "EQ",
        	"action": "DENY"
    	}
	]
}

Additional Information

Impact/Risks:
The static daemon within FRR handles the installation and deletion of static routes. When staticd is not running, static routes are not installed in routing tables, potentially causing dataplane impact and network outages depending on the routing configuration in the NSX environment.