This article addresses an issue where T1 NAT routes (t1n routes) are missing on one or more edge nodes participating in an Active-Active T0 Gateway setup. While T1 NAT routes are intended to be propagated to the T0 gateway by enabling route advertisement on the T1 router, not all edge nodes within the Active-Active T0 deployment may receive these routes.
When an edge node lacks the necessary t1n routes, traffic attempting to flow through that specific edge for NAT services will be impacted, leading to potential service disruptions and connectivity issues for the affected workloads.
Note: T1 NAT routes (t1n routes)
VMware NSX
VMware NSX-T Datacenter
The cause for missing t1n routes on affected edge nodes is a synchronization issue between the edge node(s) and the NSX-T controllers or nestdb unable to update the information. This prevents the edge node from receiving the latest routing information, including the advertised T1 NAT routes, despite them being available to other edge nodes in the same T0 deployment.
To resolve this issue, follow the steps below to identify the affected edge node, perform health checks, and attempt to resynchronize or restart relevant services.
Identifying the Affected Edge Node
Validate T0 Router Details:
Log in to all the active NSX-T Edge Node CLI (SSH) and execute the following command to identify the VRF ID of your Active-Active T0 gateway:
get logical-router | find Active_Active_T0_Name
Replace Active_Active_T0_Name with the actual name of your T0 gateway.
Enter the T0 VRF Context:
Using the VRF ID obtained from the previous step, switch to the T0 VRF context:
vrf <VRF_ID>
Replace <VRF_ID> with the actual VRF ID.
Validate T1N Route Population:
Within the T0 VRF context, check if all the required t1n routes are populated:
get route | find t1n
Review the output to ensure expected NAT routes from T1 are present.
Repeat on All Edge Nodes:
Repeat steps 1-3 on all edge nodes participating in the Active-Active T0 gateway to identify which specific node(s) are missing the t1n routes.
Health Check and Synchronization
Once the affected edge node(s) have been identified, perform the following steps on those specific nodes:
Check Controller Connectivity:
Verify that the edge node has healthy connectivity to the NSX-T controllers:
get controller
Ensure the output indicates a connected and healthy state for all controllers.
Sync Edge Configuration via NSX Manager UI:
This action forces a resynchronization of the edge node's configuration with the controllers.
get route | find t1n on the edge node CLI.Restart Local-Controller and Nestdb Services (If Issue Persists):
If syncing the configuration does not resolve the issue, restart the local-controller and nestdb services on the affected edge node. These services are crucial for route propagation and database synchronization.
a. Validate Service Status:
Before restarting, check the current status of the services:cli> get service local-controller
cli> get service nestdb
Ensure they are running before proceeding.
b. Restart Services:
Execute the following commands to restart the services:cli> restart service local-controller
cli> restart service nestdb
Allow a few minutes for the services to come back up and routes to potentially re-populate. Then, re-validate the routes using get route | find t1n.