T1 NAT routes missing from one of the edge part of active active t0 router

Products

VMware NSX

Issue/Introduction

This article addresses an issue where T1 NAT routes (t1n routes) are missing on one or more edge nodes participating in an Active-Active T0 Gateway setup. While T1 NAT routes are intended to be propagated to the T0 gateway by enabling route advertisement on the T1 router, not all edge nodes within the Active-Active T0 deployment may receive these routes.

When an edge node lacks the necessary t1n routes, traffic attempting to flow through that specific edge for NAT services will be impacted, leading to potential service disruptions and connectivity issues for the affected workloads.

Note: T1 NAT routes (t1n routes)

Environment

VMware NSX
VMware NSX-T Datacenter

Cause

The cause for missing t1n routes on affected edge nodes is a synchronization issue between the edge node(s) and the NSX-T controllers or nestdb unable to update the information. This prevents the edge node from receiving the latest routing information, including the advertised T1 NAT routes, despite them being available to other edge nodes in the same T0 deployment.

Resolution

To resolve this issue, follow the steps below to identify the affected edge node, perform health checks, and attempt to resynchronize or restart relevant services.

Identifying the Affected Edge Node

Validate T0 Router Details:
Log in to all the active NSX-T Edge Node CLI (SSH) and execute the following command to identify the VRF ID of your Active-Active T0 gateway:
```
get logical-router | find Active_Active_T0_Name
```
Replace Active_Active_T0_Name with the actual name of your T0 gateway.
Enter the T0 VRF Context:
Using the VRF ID obtained from the previous step, switch to the T0 VRF context:
```
vrf <VRF_ID>
```
Replace <VRF_ID> with the actual VRF ID.
Validate T1N Route Population:
Within the T0 VRF context, check if all the required t1n routes are populated:
```
get route | find t1n
```
Review the output to ensure expected NAT routes from T1 are present.
Repeat on All Edge Nodes:
Repeat steps 1-3 on all edge nodes participating in the Active-Active T0 gateway to identify which specific node(s) are missing the t1n routes.

Health Check and Synchronization

Once the affected edge node(s) have been identified, perform the following steps on those specific nodes:

Check Controller Connectivity:
Verify that the edge node has healthy connectivity to the NSX-T controllers:
```
get controller
```
Ensure the output indicates a connected and healthy state for all controllers.
Sync Edge Configuration via NSX Manager UI:
This action forces a resynchronization of the edge node's configuration with the controllers.
- Navigate to System > Fabric > Nodes in the NSX Manager UI.
- Select the Edge Nodes tab.
- Check the box next to the affected Edge Node(s).
- Click on ACTIONS (top right) and then select Sync Edge Configuration.
- Wait for the synchronization process to complete and re-validate if the routes are now present using get route | find t1n on the edge node CLI.
Restart Local-Controller and Nestdb Services (If Issue Persists):
If syncing the configuration does not resolve the issue, restart the local-controller and nestdb services on the affected edge node. These services are crucial for route propagation and database synchronization.

a. Validate Service Status:
Before restarting, check the current status of the services:
cli> get service local-controller cli> get service nestdb
Ensure they are running before proceeding.

b. Restart Services:
Execute the following commands to restart the services:
cli> restart service local-controller cli> restart service nestdb
Allow a few minutes for the services to come back up and routes to potentially re-populate. Then, re-validate the routes using get route | find t1n.