Stale user static routes observed on T0/T1 gateway

Products

VMware NSX

Issue/Introduction

Stale user static routes, which were deleted from the T0/T1 Logical Router at some point, are observed on T0/T1 gateway causing intermittent routing issues. The issue could appear as a static route on a T0/T1 Logical Router which has multiple nexthop IPs (for example, if the route was deleted but later re-added with a different gateway IP).
This issue is more likely to be seen after NSX Manager node addition/redeployment or loss of network between Manager nodes.
Log entries similar to below are seen in the /var/log/nsxapi.log:

Note: These log lines are for an example where multiple nexthop IPs are seen for a static route because a new route entry with stale nexthop IP has been added (100.##.##.50 is the valid nexthop and 100.##.##.54 is a stale entry):

INFO workerTaskExecutor-1-50 RouteGraphManager 4851 ROUTING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Affected networks from route graph are : [{ Network : 100.##.##.0/24, Direction : SOUTH_BOUND, nhToRealExitNextHopMap : { { nhIp: 100.##.##.54,static route id : StaticRoute/########-####-####-####-####5f95cc14, adminDistance: 1, isDeleted: false } : [100.##.##.54], { nhIp: 100.##.##.50,static route id : StaticRoute/########-####-####-####-####5065ce43, adminDistance: 1, isDeleted: false } : [100.##.##.50], } }] for lr LogicalRouter/########-####-####-####-####eb1518d4

INFO workerTaskExecutor-1-50 UserStaticRoutesCCPPublisher 4851 ROUTING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Created new ccp route RouteNextHopConfigUFOProxy{networkId=########-####-####-####-####daeabf5c, networkConfig=100.##.##.0/24, routerId=LogicalRouterConfig/########-####-####-####-####eb1518d4, routeType=STATIC, description='null', logicalRouterPortId=null, peerLifId=null, nextHopIp=100.##.##.54, administrativeDistance=1, blackhole=false, blackholeAction=ROUTE_BLACKHOLE_ACTION_INVALID, vrfName='null', disabled=false, originEntityId=StaticRoute/########-####-####-####-####5f95cc14, lrResourceId=LogicalRouter/########-####-####-####-####eb1518d4, spanToSrs=false, identifier=RouteNextHopConfig/########-####-####-####-####adb46ec9}

Environment

VMware NSX-T Data Center 3.2.x

VMware NSX 4.0.x, 4.1.0, 4.1.1

Cause

This issue is due to a software defect identified in the affected NSX versions

Resolution

This issue is resolved in NSX versions 4.1.2 and newer

Workaround

The issue can be resolved by executing the LogicalRouter reprocess API for the affected T1/T0 gateway (root mode access required)

#curl -k -u admin -X POST 'https://localhost /policy/api/v1/infra/tier-1s/<tier-1-id>?action=reprocess

Another option is to restart the proton service on all NSX Manager nodes (root mode access required; this will affect NSX Manager UI access temporarily)

#service proton restart