Federated NSX-T upgrade to any version before 3.2.3 or 4.0.2 fails on data migration or dry run tool migration check
search cancel

Federated NSX-T upgrade to any version before 3.2.3 or 4.0.2 fails on data migration or dry run tool migration check

book

Article ID: 317807

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
One of the local NSX cluster upgrade fails during data migration task of BridgeEndpointProfileRelationsMigrationTask

'Get upgrade process-status' displays status with the error as below:

Upgrade steps:
download_os [2022-11-28 13:29:36 - 2022-11-28 13:29:56] SUCCESS
shutdown_manager [2022-11-28 13:30:00 - 2022-11-28 13:32:11] SUCCESS
install_os [2022-11-28 13:32:11 - 2022-11-28 13:33:02] SUCCESS
migrate_manager_config [2022-11-28 13:33:02 - 2022-11-28 13:33:07] SUCCESS
switch_os [2022-11-28 13:33:07 - 2022-11-28 13:33:12] SUCCESS
reboot [2022-11-28 13:33:12 - 2022-11-28 13:33:46] SUCCESS
run_migration_tool [2022-11-28 13:35:02 - 2022-11-28 13:36:43] FAILED
------ Output of last step start ------
    Status:
2022-11-28 13:35:03.454191 Deleting datastore files
2022-11-28 13:35:03.514168 Copying old datastore files
2022-11-28 13:35:04.823565 Done copying old datastore files
2022-11-28 13:35:06.493375 Start Corfu server
2022-11-28 13:35:10.574856 Process corfu-server started
2022-11-28 13:36:40.295315 Error running logical data migration tool. return value 1, log file /var/log/proton/logical-migration.log

Overall Progress: (3/6)
---- (1) CCP: Completed [1 object(s)] (2022-11-28 13:35:10 - 2022-11-28 13:35:16) ----
--------------------------------------------------------------------------------------------
---- (2) Proton: Completed [52040 object(s)] (2022-11-28 01:35:31 - 2022-11-28 01:35:45) ----
--------------------------------------------------------------------------------------------
---- (3) Policy: Completed [38034 object(s)] (2022-11-28 01:35:57 - 2022-11-28 01:36:21) ----
--------------------------------------------------------------------------------------------
---- (4) Logical: 41% [10773 of 25803 object(s)] (2022-11-28 01:36:35 - ) ----
Currently Migrating: BridgeEndpointProfileRelationsMigrationTask 0% [0 of 0 objects] (2022-11-28 01:36:39 - )
--------------------------------------------------------------------------------------------
---- (5) CBM: Pending
--------------------------------------------------------------------------------------------
---- (6) UFO Checkpointing: Pending
--------------------------------------------------------------------------------------------
    Stdout: Starting Manager run_migration_tool script
Ending Manager run_migration_tool script

   Troubleshooting: Upgrade has failed and retry may not work. Appliance OS is of a new version; however, UI will not be available. Please contact GSS to rollback the system to the previous version.
------ Output of last step end ------


Dry-run migration tool also shows failure in data migration

In NSX manager/ Dry run tool logical-migration logs we can see below ERROR logs stating that no edge node with index 1 was found

2022-11-29T07:29:45.790Z INFO main MigrationTask 3014 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Edge Cluster ID in use: 25ff79ee-####-####-####-########d65
2022-11-29T07:29:45.790Z INFO main MigrationTask 3014 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Edge Cluster path: /global-infra/sites/PA-INFRA/enforcement-points/default/edge-clusters/25ff79ee-####-####-####-########d65
2022-11-29T07:29:45.790Z INFO main MigrationTask 3014 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Edge Node index in use: 1
2022-11-29T07:29:45.790Z WARN main UfoCorfuTableMigrator 3014 - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] ERROR while running logical migration MappingDetails{modelName='null', migrationType=null, reason='This task will fix/add BridgeEndpointProfile relationships with Edge Cluster and Edge Node', customMigratorClassName='com.vmware.nsx.management.migration.impl.BridgeEndpointProfileRelationsMigrationTask', fieldMappings=null, targetProtoName='null', requiresCustomCode='false', owner='null', apiToTest='null'}
java.lang.RuntimeException: No Edge Node found with index 1



Environment

VMware NSX-T Data Center

Cause

During proton migration, NSX creates PolicyEdgeNodes with index in path blindly

During policy migration, NSX checks if there are any PolicyEdgeNodes with UUID, then we copy the data from its corresponding PolicyEdgeNode with index and delete that copy. This is done by matching the edgeTnId kept with PolicyEdgeNode. With this logic, NSX may end up deleting the GM copy instead of the LM copy

Resolution

This issue is resolved in NSX-T Data Center 3.2.3 and 4.0.2 onwards.

Workaround:
Reach out to VMware NSX support for verification.