NSX-T upgrade to 3.x on the MP nodes fails at step 7 (run_migration_tool)
book
Article ID: 303350
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
Step 7 of the MP upgrade workflow failed on the orchestrator node (run_migration_tool)
Unable to finish upgrade on the NSX the Management Plane nodes
nsxt-mgr> get upgrade progress-status
****************************************************************************
Node Upgrade has been started. Please do not make any changes, until
the upgrade operation is complete. Run "get upgrade progress-status"
to show the progress of last upgrade step.
****************************************************************************
Mon Aug 16 2021 UTC 22:11:57.109
Upgrade info:
From-version: 3.0.2.0.0.16887203
To-version: 3.1.0.0.0.17107171
Upgrade steps:
download_os [2021-08-16 13:55:41 - 2021-08-16 13:56:06] SUCCESS
shutdown_manager [2021-08-16 13:56:12 - 2021-08-16 14:02:04] SUCCESS
install_os [2021-08-16 14:02:04 - 2021-08-16 14:02:53] SUCCESS
migrate_manager_config [2021-08-16 14:02:53 - 2021-08-16 14:02:59] SUCCESS
switch_os [2021-08-16 14:02:59 - 2021-08-16 14:03:03] SUCCESS
reboot [2021-08-16 14:03:03 - 2021-08-16 14:04:44] SUCCESS
run_migration_tool [2021-08-16 14:05:38 - ] FAILED
Status: Corfu Infrastructure Server is not running.
Deleting datastore files
Copying old datastore files
Done copying old datastore files
Running migrate layout
Output : b'Current cluster id ec2cc28d-7322-####-####-d5cbf8ab9d76\nGenerated new layout in LAYOUT_CURRENT.ds {\n "layoutServers": [\n "10.15.x.176:9000"\n ],\n "sequencers": [\n "10.15.##.176:9000"\n ],\n "segments": [\n {\n "replicationMode": "CHAIN_REPLICATION",\n "start": 0,\n "end": -1,\n "stripes": [\n {\n "logServers": [\n "10.15.##.176:9000"\n ]\n }\n ]\n }\n ],\n "unresponsiveServers": [],\n "epoch": 1688,\n "clusterId": "ec2cc28d-7322-####-####-d5cbf8ab9d76"\n}\nUpdated /config/corfu/LAYOUT_CURRENT.ds and /config/corfu/MANAGEMENT_LAYOUT.ds\n'
Start Corfu server
Process corfu-server started
Start Cluster Boot Manager
Run migration tool with migration directory /image/VMware-NSX-unified-appliance-3.1.0.0.0.17107171/files
Completed running data migration tool. log file /var/log/data-migration.log
Completed running UFO data migration tool. log file /var/log/data-migration.log
Start cluster-boot-manager data migration.
Completed running cluster-boot-manager data migration. output
Run Policy migration tool with migration directory /image/VMware-NSX-unified-appliance-3.1.0.0.0.17107171/files
Completed running policy data migration tool. log file /var/log/policy-data-migration.log
Running corfu_compactor_upgrade_runner
Troubleshooting: Upgrade has failed, and retry may not work. Appliance OS is of a new version, however, UI will not be available. Please contact GSS to rollback the system to the previous version.
nsxt-mgr>
======================================================
root@nsxt-mgr-np-1:/var/log/corfu# ls -l
total 1511468
-rw-r--r-- 1 root root 156 Aug 16 14:10 compaction_trim_mark.log <<<<<< reason for failed upgrade
-rw------- 1 corfu corfu 97276 Aug 16 14:10 compactor-gc.log.0.current
-rw-r----- 1 root adm 69698920 Aug 8 10:54 corfu.9000.10.log.gz
-rw-r----- 1 root adm 70041990 Aug 7 13:49 corfu.9000.11.log.gz
-rw-r----- 1 root adm 70137582 Aug 6 16:44 corfu.9000.12.log.gz
-rw-r----- 1 root adm 69939501 Aug 5 19:40 corfu.9000.13.log.gz
-rw-r----- 1 root adm 69997705 Aug 4 22:34 corfu.9000.14.log.gz
-rw-r----- 1 root adm 70069132 Aug 4 01:24 corfu.9000.15.log.gz
Environment
VMware NSX-T Data Center 3.x
Cause
As part of the data migration step, corfu compactor is run through the corfu_compactor_upgrade_runner and then the script verifies if the compactor run was successful or not. In this case, after an unsuccessful attempt on upgrade, the stale copy of compaction_trim_mark file lingers around and blocks the upgrade.