NSX-T manager upgrade fails at run_migration_tool stage due to stale edge cluster referenced by URL/FQDN Analysis
search cancel

NSX-T manager upgrade fails at run_migration_tool stage due to stale edge cluster referenced by URL/FQDN Analysis

book

Article ID: 319079

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
  • You are upgrading to NSX-T 3.2.x
  • The upgrade fails after the upgrade coordinator manager node reboot stage and the start of the run_migration_tool stage, as can be seen when as admin we run:
> get upgrade progress-status
...
Upgrade steps:
download_os [2022-05-25 10:41:20 - 2022-05-25 10:42:09] SUCCESS
shutdown_manager [2022-05-25 10:42:37 - 2022-05-25 10:44:24] SUCCESS
install_os [2022-05-25 10:44:24 - 2022-05-25 10:45:37] SUCCESS
migrate_manager_config [2022-05-25 10:45:37 - 2022-05-25 10:45:43] SUCCESS
switch_os [2022-05-25 10:45:43 - 2022-05-25 10:45:55] SUCCESS
reboot [2022-05-25 10:45:56 - 2022-05-25 10:46:32] SUCCESS
run_migration_tool [2022-05-25 10:49:49 - 2022-05-25 10:50:24] FAILED
...
  • We see the following WARNING in the manager log: /var/log/proton/logical-migration.log
2022-05-24T09:27:04.151Z  INFO main ProtobufCopier 3780 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Registering types src="vmware.nsx.ufostore.policyframework.model.PolicyUrlCategorizationConfigMsg" dst="vmware.nsx.ufostore.security.model.PolicyUrlCategorizationConfigInternalMsg" for automatic copying
2022-05-24T09:27:04.154Z  WARN main UfoCorfuTableMigrator 3780 - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] ERROR while running logical migration MappingDetails{modelName='null', migrationType=null, reason='This migration task will run for migrations from 3.2.0 to 3.2.1 for Policy URL categorization config. It will not run for upgrades from 3.2.1 to Next', customMigratorClassName='com.vmware.nsx.management.migration.impl.PolicyUrlCategorizationConfigPathToUuidMigrationTask', fieldMappings=null, targetProtoName='null', requiresCustomCode='false', owner='null', apiToTest='null'}
java.lang.NullPointerException: null
        at com.vmware.nsx.management.migration.impl.PolicyUrlCategorizationConfigPathToUuidMigrationTask.migratePolicyUrlCategorizationConfig(PolicyUrlCategorizationConfigPathToUuidMigrationTask.java:147) ~[logical-migration.jar:?]
        at com.vmware.nsx.management.migration.impl.PolicyUrlCategorizationConfigPathToUuidMigrationTask.migrate(PolicyUrlCategorizationConfigPathToUuidMigrationTask.java:63) ~[logical-migration.jar:?]
        at com.vmware.nsx.management.migration.ufo.UfoCorfuTableMigrator.migrate(UfoCorfuTableMigrator.java:138) [logical-migration.jar:?]
        at com.vmware.nsx.management.migration.ufo.UFOMigration.migrate(UFOMigration.java:235) [logical-migration.jar:?]
        at com.vmware.nsx.management.migration.impl.LogicalMigration.executeMigrations(LogicalMigration.java:43) [logical-migration.jar:?]
        at com.vmware.nsx.management.migration.impl.Migration.migrate(Migration.java:46) [logical-migration.jar:?]
        at com.vmware.nsx.management.migration.impl.LogicalMigration.main(LogicalMigration.java:29) [logical-migration.jar:?]
  • You had one or more edge cluster(s), but they were removed before the upgrade and do not exist anymore.
  • URL/FQDN Analysis was enabled for these now removed edge clusters.
  • You did not run the NSX Upgrade Evaluation Tool: https://kb.vmware.com/s/article/87379.


Environment

VMware NSX-T Data Center

Cause

When the edge cluster was removed, it still had URL/FQDN Analysis enabled on it, there are internal tables which still reference this/these edge cluster(s) for URL/FQDN Analysis and therefore the upgrade fails as it is unable to find the edge cluster(s).
Running the above mentioned 'NSX Upgrade Evaluation Tool' would have caught this issue and allowed it to be resolved before doing the upgrade.

Resolution

This is a known issue affecting NSX-T Data Center.
To avoid this issue, disable URL/FQDN Analysis on the edge cluster before removing it and run the 'NSX Upgrade Evaluation Tool' prior to the upgrade.

Workaround:
Disable URL/FQDN Analysis on the edge cluster before you delete the edge cluster.
If you have already started the upgrade and encounter this issue, please open a VMware support request and reference this KB.