edge> get logical-routers
edge> vrf <vrf # of Service Router>
edge(tier0_sr[vrf#])> get route
<--- static routes are missing from outputroot@edge:/var/log/frr# /usr/lib/frr/frrinit.sh status
/usr/lib/frr/frrcommon.sh: line 304: declare: watchfrr_options: not found
* Status of watchfrr: running
* Status of zebra: running
* Status of bgpd: running
* Status of ospfd: running
* Status of pimd: running
* Status of staticd: FAILED
/var/log/frr/frr.log
on Edge when static daemon is shut down:<timestamp> ZEBRA: zebra_rnh_remove_from_routing_table: 0:100.64.#.#/32 removed from tracking on 0.0.0.0/0
<timestamp> ZEBRA: zebra_rnh_store_in_routing_table: 0:100.64.#.#/32 added for tracking on 100.64.#.#/31
<timestamp> ZEBRA: [EC 100663299] stream_read_try: read failed on fd 49: Connection reset by peer
<timestamp> ZEBRA: connection closed socket [49]
<timestamp> ZEBRA: [EC 4043309117] Client 'static' encountered an error and is shutting down.
<timestamp> ZEBRA: Closing client 'static'
<timestamp> ZEBRA: release_daemon_table_chunks: Released 0 table chunks
<timestamp> ZEBRA: release_daemon_label_chunks: Released 0 label chunks
<timestamp> ZEBRA: zebra_rnh_remove_from_routing_table: 0:10.#.#.#/32 removed from tracking on 10.#.#.#/27
<timestamp> ZEBRA: zebra_rnh_remove_from_routing_table: 0:10.#.#.#/32 removed from tracking on 10.#.#.#/29
…
…
<timestamp> ZEBRA: zebra_rnh_remove_from_routing_table: 0:100.64.#.#/32 removed from tracking on 100.64.#.#/31
<timestamp> ZEBRA: zebra_rnh_remove_from_routing_table: 0:100.64.#.#/32 removed from tracking on 100.64.#.#/31
<timestamp> ZEBRA: client 49 disconnected 226 static routes removed from the rib
<timestamp> ZEBRA: zserv_client_free: Deleting client static
/var/log/frr/frr.log
on Edge:<timestamp> STATIC: [EC 100663301] zapi_route_encode: prefix 192.168.#.#/##: can't encode 257 nexthops (maximum is 256)
<timestamp> STATIC: [EC 100663301] zapi_route_encode: prefix 192.168.#.#/##: can't encode 257 nexthops (maximum is 256)
/var/log/syslog
on Edge:<timestamp> <Edge FQDN> staticd 26054 - - [EC 100663301] zapi_route_encode: prefix 192.168.#.#/##: can't encode 258 nexthops (maximum is 256)
<timestamp> <Edge FQDN> staticd 22511 - - [EC 100663301] zapi_route_encode: prefix 192.168.#.#/##: can't encode 258 nexthops (maximum is 256)
<timestamp> <Edge FQDN> staticd 15765 - - [EC 100663301] zapi_route_encode: prefix 192.168.#.#/##: can't encode 257 nexthops (maximum is 256)
In some cases, the IP subnet mentioned in the errors isn’t being used in any segments or set as a static route in Tier-1. Instead, it matches the IP address of the DHCP server set in the DHCP profile, like 192.168.x.1/24. This DHCP profile is attached to Tier-1 to provide a Gateway (centralized) DHCP server for the segments connected to Tier-1. However, these segments don’t use the same subnet as the DHCP server (192.168.x.0/24), but their own, such as 100.10.x.x/26.
If multiple Tier-1s are created using the same DHCP server, the DHCP subnet (e.g., 192.168.x.0/24) will be advertised to Tier-0 and all the segments connected to the Tier-1s. This can quickly exceed the 256-subnet limit.
VMware NSX-T Datacenter
FRR which manages routing with NSX Edges can only handle 256 equal cost next hops for a single route.
The DHCP server subnet is also advertised to Tier-0. When there are many Tier-1s using the same DHCP server, the route advertisement for this subnet can quickly exceed the limit.
This issue is resolved in VMware NSX-T 3.2.4 and VMware NSX 4.2.0. Later NSX version staticd service will not create new route that will exceed the next hop count of 256 and NSX will throw an alarm.
Workaround:
To remediate this issue, customer should check if there are any Tier-1s/Segments that are advertising routes to the subnets showing up in the logs. Remove or reduce the redundant advertisement of the routes.
If this is related to DHCP server subnet, create a route advertisement rule in each Tier-1 uses the DHCP server to exclude the DHCP subnet from route advertisement. Below is an example:
Alternatively, an API call can be used to create such rule and attach to a Tier-1 gateway. Example:
PATCH https://<NSX-Mgr>/policy/api/v1/infra/tier-1s/{tier-1-name}
{
"route_advertisement_rules": [
{
"name": "No_Advertisement_DHCP_Subnets",
"subnets": ["100.96.0.0/30"],
"route_advertisement_types": ["TIER1_CONNECTED"],
"prefix_operator": "EQ",
"action": "DENY"
}
]
}
Impact/Risks:
The static daemon within FRR handles the installation and deletion of static routes. When staticd is not running, static routes are not installed in routing tables, potentially causing dataplane impact and network outages depending on the routing configuration in the NSX environment.