Unable to migrate an NSX Segment from an old Tier-0/Tier-1 to a new T0/T1 as the old T0 is advertising the segment route instead of the new T0
search cancel

Unable to migrate an NSX Segment from an old Tier-0/Tier-1 to a new T0/T1 as the old T0 is advertising the segment route instead of the new T0

book

Article ID: 401529

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Migrating to a new NSX Edge with a new NSX Tier-1 and Tier-0 
  • Migrated an NSX segment where the old Tier-0 Gateway was still advertising the segment route instead of the new Tier-0, resulting in VMs being unable to communicate with that segment
  • Changed connectivity of segment Segment-1 (with IP subnet ##.##.##.##/24) from Orig-Tier-1 to Target-Tier-1
  • Relevant log excerpts from NSX Manager below:

Log (/var/log/syslog):

2025-05-06T23:51:57.529Z <nsx-manager> NSX 4596 - [nsx@6876 audit="true" comp="nsx-manager" entId="<Segment-1>" level="INFO" reqId="<UUID>" subcomp="manager" update="true" username="<user>"] UserName="<user>", ModuleName="PolicyConnectivity", 
Operation="CreateOrReplaceInfraSegment", 
Operation status="success", 
Old value=[{"type":"ROUTED",
"subnets":[{"gateway_address":"##.##.##.##/24","network":"##.##.##.##/24"}], 
"connectivity_path":"/infra/tier-1s/Orig-Tier-1",
...
"transport_zone_path":"/infra/sites/default/enforcement-points/default/transport-zones/<UUID>","advanced_config":{"hybrid":false,"multicast":true,"inter_router":false,"local_egress":false,"urpf_mode":"STRICT","connectivity":"ON"},"<user>_state":"UP","replication_mode":"MTEP","resource_type":"Segment","id":"Segment-1","display_name":"Segment-1","path":"/infra/segments/Segment-1","relative_path":"Segment-1","parent_path":"/infra","unique_id":"<UUID>","realization_id":"<UUID>","marked_for_delete":false,"overridden":false,"_create_time":1601481628024,"_create_user":"<user>","_last_modified_time":1746571500299,"_last_modified_user":"<user>","_system_owned":false,"_protection":"NOT_PROTECTED","_revision":10}], 
...
New value=["Segment-1" {"type":"ROUTED","subnets":[{"gateway_address":"##.##.##.##.##/24"}], 
"connectivity_path":"/infra/tier-1s/Target-Tier-1",
...
"transport_zone_path":"/infra/sites/default/enforcement-points/default/transport-zones/<UUID>","advanced_config":{"hybrid":false,"multicast":true,"inter_router":false,"local_egress":false,"urpf_mode":"STRICT","connectivity":"ON"},"<user>_state":"UP","replication_mode":"MTEP","resource_type":"Segment","id":"Segment-1","display_name":"Segment-1","path":"/infra/segments/Segment-1","relative_path":"Segment-1","parent_path":"/infra","unique_id":"<UUID>","realization_id":"<UUID>","marked_for_delete":false,"overridden":false,"_create_time":1601481628024,"_create_user":"<user>",
"_last_modified_time":1746571500299,( May 6, 2025 10:45:00.299 PM)
"_last_modified_user":"<user>","_system_owned":false,"_revision":10}]

 

Log (/var/log/proton/nsxapi.log):

25-05-06T23:45:17.245Z  INFO workerTaskExecutor-34 LogicalRouterWorker 4651 ROUTING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Processing L3Trigger L3Trigger/<UUID> of logical router LogicalRouter/<UUID>, entity id LrPort/<UUID> and type L3_TRIGGER_TYPE_ENTITY_DELETE

Log (/var/log/proton/nsxapi.log):

2025-05-06T23:45:17.572Z  INFO workerTaskExecutor-34 RAPublisherHelper 4651 ROUTING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Removed entity LogicalRouterPortConfig/<UUID> and networks [LrAdvertiseNetworkInfoProxy{network=##.##.##.##/24, routeType=CONNECTED, advertiseRouteType=T1_DOWNLINK, advertiseAllow=true, advertiseAdminDistance=null}] for advertisement on TIER0 from TIER1 LogicalRouter/<UUID>

Log (/var/log/proton/nsxapi.log):

2025-05-06T23:45:17.750Z  INFO workerTaskExecutor-34 LRServiceIPLRPortArpProxyUpdateListener 4651 ROUTING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Processing proxy ARP on lif LogicalRouterPortConfig/<UUID> update for LogicalRouter/<UUID>

Log (/var/log/proton/nsxapi.log):

2025-05-06T23:45:17.853Z ERROR workerTaskExecutor-34 EdgeWorkItemExceptionAspect 4651 - [nsx@6876 comp="nsx-manager" errorCode="MP11268" level="ERROR" subcomp="manager"] Failed to process work-items [WorkItem{identifier=L3Trigger/<UUID>, Timestamp{epoch=-1, address=5417684973}}] of lr LogicalRouter/<UUID> by method processL3TriggerWorkItems
java.lang.NullPointerException: null
        at com.vmware.nsx.management.edge.publish.worker.LRServiceIPLRPortArpProxyUpdateListener.clearRoutesForOldNetworks(LRServiceIPLRPortArpProxyUpdateListener.java:316) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LRServiceIPLRPortArpProxyUpdateListener.recalculateArpProxyEntriesForLif(LRServiceIPLRPortArpProxyUpdateListener.java:199) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LRServiceIPLRPortArpProxyUpdateListener.updateArpTableOnLifUpdate(LRServiceIPLRPortArpProxyUpdateListener.java:184) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LRServiceIPLRPortArpProxyUpdateListener.processLifUpdate(LRServiceIPLRPortArpProxyUpdateListener.java:172) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LRServiceIPLRPortArpProxyUpdateListener.processLrLrpConfigUpdate(LRServiceIPLRPortArpProxyUpdateListener.java:103) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LrLrpCCPPublisher.publishL3Config(LrLrpCCPPublisher.java:153) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LrLrpCCPPublisher.publishLrAndLrp(LrLrpCCPPublisher.java:144) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LogicalRouterWorker.handleLrAndLrp_aroundBody18(LogicalRouterWorker.java:498) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LogicalRouterWorker$AjcClosure19.run(LogicalRouterWorker.java:1) ~[?:?]
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) ~[?:?]
        at com.vmware.nsx.management.corfudb.TransactionRetryAspect.handleConcurrentTransaction(TransactionRetryAspect.java:60) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LogicalRouterWorker.handleLrAndLrp(LogicalRouterWorker.java:481) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LogicalRouterWorker.handleL3Trigger_aroundBody28(LogicalRouterWorker.java:693) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LogicalRouterWorker$AjcClosure29.run(LogicalRouterWorker.java:1) ~[?:?]
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) ~[?:?]
        at com.vmware.nsx.management.corfudb.TransactionRetryAspect.handleConcurrentTransaction(TransactionRetryAspect.java:71) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LogicalRouterWorker.handleL3Trigger(LogicalRouterWorker.java:652) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LogicalRouterWorker.processL3TriggerWorkItems_aroundBody10(LogicalRouterWorker.java:377) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LogicalRouterWorker.processL3TriggerWorkItems_aroundBody11$advice(LogicalRouterWorker.java:40) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LogicalRouterWorker.processL3TriggerWorkItems(LogicalRouterWorker.java:1) ~[?:?]
        at com.vmware.nsx.management.edge.publish.worker.LogicalRouterWorker.lambda$processWorkItemsByLr$2(LogicalRouterWorker.java:325) ~[?:?]

 

Environment

VMware NSX 3.2

Cause

  • Changed connectivity of segment Segment-1 (with IP subnet ##.##.##.##/24) from Orig-Tier-1 to Target-Tier-1
  • In the worker thread corresponding to this change, LogicalRouterPortConfig (infra-Segment-1-dlrp)(UUID) for the downlink LR Port and corresponding route should have cleared up but it failed due to an exception.
  • Stale dlrp and stale route still exist with admin state down. So, 2 routes for ##.##.##.##/24 (one with admin state up and other with admin state down) are getting advertised to T0.
  • Provider succeeded but only worker failed.
  • Only the LogicalRouterPortConfig still exists but corresponding LrPort does not exist (i.e. CCP and Edge have stale entries but policy and manager tables don't have). API deletion won't help.

Resolution

This issue is fixed in VMware NSX 4.0.0.1 and above. We recommend upgrading NSX to the latest version.

If a work-around is required you may do the following:

  • Inside the NSX web UI: "Networking" > "Tier-1 Logical Routers" > Choose the appropriate Tier-1 Logical Router (Example: "Orig-Tier-1") 
  • Click the 3-dot ellipsis and Click "Edit"
  • Click "Route Advertisements" to expand the route options.
  • Find "Set Route Advertisement Rules" and click "Set"
    • Click "Add Route Advertisement Rule" to add a new rule
    • Input a new name for the rule
    • Input the subnet which you wish to filter from being advertised on the old Tier-0 Gateway
    • Click "Apply Filter" to the "Yes" position in order to enable the filter option
    • Set "Advertise Action" to "Deny"
    • Click "Add" to add the route filter rule to the Tier-1
    • Click "Apply"
  • Click "Save" on the Tier-1 settings.
  • Click "Close Editing"

Validating the work around:

  • If you check the Tier-1 Routing tab: Advertised Networks it still listed ##.##.##.##/24 but the "Advertised" column was now a red dot "No" showing the route is no longer advertised.
  • ssh to the original NSX Edge as "admin" user.
    • Type "get logical-routers" and find the VRF ID for the appropriate Logical Router UUID
    • Type "vrf #" (the VRF ID determined from the previous step) to change the cli context to the correct tier0-sr[#]
    • Type "get forwarding" to dump the route forwarding table and observe the subnet ##.##.##.##/24 for Segment-1 is no longer being advertised
  • You can run ping or other connectivity test from other segments or external network points to confirm connectivity between VMs and services.