NSX-T upgrade to 3.x on the MP nodes fails at step 7 (run_migration_tool)
search cancel

NSX-T upgrade to 3.x on the MP nodes fails at step 7 (run_migration_tool)

book

Article ID: 303350

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Step 7 of the MP upgrade workflow failed on the orchestrator node (run_migration_tool)
  • Unable to finish upgrade on the NSX the Management Plane nodes

    nsxt-mgr> get upgrade progress-status
    ****************************************************************************
    Node Upgrade has been started. Please do not make any changes, until
    the upgrade operation is complete. Run "get upgrade progress-status"
    to show the progress of last upgrade step.
    ****************************************************************************
    
    Mon Aug 16 2021 UTC 22:11:57.109
    Upgrade info:
    From-version: 3.0.2.0.0.16887203
    To-version: 3.1.0.0.0.17107171
    
    Upgrade steps:
    download_os [2021-08-16 13:55:41 - 2021-08-16 13:56:06] SUCCESS
    shutdown_manager [2021-08-16 13:56:12 - 2021-08-16 14:02:04] SUCCESS
    install_os [2021-08-16 14:02:04 - 2021-08-16 14:02:53] SUCCESS
    migrate_manager_config [2021-08-16 14:02:53 - 2021-08-16 14:02:59] SUCCESS
    switch_os [2021-08-16 14:02:59 - 2021-08-16 14:03:03] SUCCESS
    reboot [2021-08-16 14:03:03 - 2021-08-16 14:04:44] SUCCESS
    run_migration_tool [2021-08-16 14:05:38 - ] FAILED
        Status: Corfu Infrastructure Server is not running.
    
    Deleting datastore files
    Copying old datastore files
    Done copying old datastore files
    Running migrate layout
    Output : b'Current cluster id ec2cc28d-7322-####-####-d5cbf8ab9d76\nGenerated new layout in LAYOUT_CURRENT.ds {\n  "layoutServers": [\n    "10.15.x.176:9000"\n  ],\n  "sequencers": [\n    "10.15.x.176:9000"\n  ],\n  "segments": [\n    {\n      "replicationMode": "CHAIN_REPLICATION",\n      "start": 0,\n      "end": -1,\n      "stripes": [\n        {\n          "logServers": [\n            "10.15.x.176:9000"\n          ]\n        }\n      ]\n    }\n  ],\n  "unresponsiveServers": [],\n  "epoch": 1688,\n  "clusterId": "ec2cc28d-7322-####-####-d5cbf8ab9d76"\n}\nUpdated /config/corfu/LAYOUT_CURRENT.ds and /config/corfu/MANAGEMENT_LAYOUT.ds\n'
    Start Corfu server
    Process corfu-server started
    Start Cluster Boot Manager
    Run migration tool with migration directory /image/VMware-NSX-unified-appliance-3.1.0.0.0.17107171/files
    Completed running data migration tool. log file /var/log/data-migration.log
    Completed running UFO data migration tool. log file /var/log/data-migration.log
    Start cluster-boot-manager data migration.
    Completed running cluster-boot-manager data migration. output
    Run Policy migration tool with migration directory /image/VMware-NSX-unified-appliance-3.1.0.0.0.17107171/files
    Completed running policy data migration tool. log file /var/log/policy-data-migration.log
    Running corfu_compactor_upgrade_runner
    
        Troubleshooting: Upgrade has failed, and retry may not work. Appliance OS is of a new version, however, UI will not be available. Please contact GSS to rollback the system to the previous version.
    
    nsxt-mgr>
    
    ======================================================
    
    
    
    root@nsxt-mgr-np-1:/var/log/corfu# ls -l
    total 1511468
    -rw-r--r-- 1 root  root       156 Aug 16 14:10 compaction_trim_mark.log <<<<<< reason for failed upgrade
    -rw------- 1 corfu corfu    97276 Aug 16 14:10 compactor-gc.log.0.current
    -rw-r----- 1 root  adm   69698920 Aug  8 10:54 corfu.9000.10.log.gz
    -rw-r----- 1 root  adm   70041990 Aug  7 13:49 corfu.9000.11.log.gz
    -rw-r----- 1 root  adm   70137582 Aug  6 16:44 corfu.9000.12.log.gz
    -rw-r----- 1 root  adm   69939501 Aug  5 19:40 corfu.9000.13.log.gz
    -rw-r----- 1 root  adm   69997705 Aug  4 22:34 corfu.9000.14.log.gz
    -rw-r----- 1 root  adm   70069132 Aug  4 01:24 corfu.9000.15.log.gz

Environment

VMware NSX-T Data Center 3.x

Cause

As part of the data migration step, corfu compactor is run through the corfu_compactor_upgrade_runner and then the script verifies if the compactor run was successful or not. In this case, after an unsuccessful attempt on upgrade, the stale copy of compaction_trim_mark file lingers around and blocks the upgrade.

Resolution

This issue is resolved in VMware NSX-T Data Center 3.1.2 available at Broadcom Downloads.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.
 
 
Workaround
 
Delete/remove compaction_trim_mark.log from /var/log/corfu/

Additional Information

Impact/Risks:
Unable to complete NSX-T Manager node upgrade.