NSX upgrade rollback from 4.1.0 to 3.2.1 fails at step2_restore_data with error "failed at task run_db_restore"
search cancel

NSX upgrade rollback from 4.1.0 to 3.2.1 fails at step2_restore_data with error "failed at task run_db_restore"

book

Article ID: 322635

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  • You have upgraded NSX-T from 3.2.1.x to NSX 4.1.0 and need to rollback the Management Plane upgrade.
  • Executing step 2 of the rollback fails with the below error:
nsx-manager> node-rollback run-step step2_restore_data
This will restore data-store to a snapshot taken at the beginning of upgrade.
- Run this command only on one of the Manager nodes.
- Please make sure that step1_start_rollback step is successful on all nodes of cluster before starting this step.
- After this step, run step3_exit_rollback on all the nodes to continue the rollback.

Do you want to continue? (yes/no): yes
<date/time> - Running node_version_check
<date/time> - Node version checks passed.

<date/time> - Running restore_ready_check
<date/time> - Node restore ready checks passed.

<date/time> - Start running step step2_restore_data
<date/time> - Running "run_db_restore" (task 1 of 1)
<date/time> - Step step2_restore_data failed at task run_db_restore.
{
  "state": 2,
  "state_text": "CMD_ERROR",
  "info": "[MRS] RollbackError: Step step2_restore_data failed at task run_db_restore.",
  "body": null
}
  • Running the command "get node-rollback progress-status" on the CLI of the NSX manager where the restore data command was run you see the following failed task and an error running corfu compactor:
nsx-manager> get node-rollback progress-status
<date/time>
Rollback info:
        From-version: 4.1.0.0.0.21332677
        To-version: 3.2.1.1.0.20115694

Rollback tasks:
        backup_version_check [<date/time-1> - <date/time-2>] SUCCESS
        cleanup_bundle [<date/time-1> - <date/time-2>] SUCCESS
        reboot_into_rollback_version [<date/time-1> - <date/time-2>] SUCCESS
        restore_config_files [<date/time-1> - <date/time-2>] SUCCESS
        start_corfu [<date/time-1> - <date/time-2>] SUCCESS
        node_version_check [<date/time-1> - <date/time-2>] SUCCESS
        restore_ready_check [<date/time-1> - <date/time-2>] SUCCESS
        run_db_restore [<date/time-1> - <date/time-2>] FAILED
                status_output - Starting db restore
Db restore complete.
Starting running corfu compactor
Error running corfu compactor. rc: 1, err: Welcome to use NSX corfu tool.
This tool can help you to examine or compact data in a CorfuDB database.
  • This issue can also be represented with the following error:
Error running corfu compactor. rc: 255,
  • If the NSX log support bundle is present the error can be found in /system/rollback_helper.py_support:    
Product Log:
      {
        "args": {
          "node_type": "nsx-manager nsx-policy-manager nsx-controller",
          "rollback_from_version": "4.1.1.0.0.21791062",
          "rollback_to_version": "3.2.1.1.0.20115694",
          "bundle_files_path": "/image/VMware-NSX-unified-appliance-3.2.1.1.0.20115694-rollback/files",
          "old_os_path": "/os_bak",
          "alt_os_path": "/",
          "status_file": "/tmp/rollback2mo5oozo",
          "troubleshooting_file": "/tmp/rollback_troubleshootingtwm6es6d"
        },
        "end_time": "<date/time>",
        "failure_reason": "Script exited with non-zero return code 255.",
        "task_id": "<uuid>",
        "name": "run_db_restore",
        "path": "/image/VMware-NSX-unified-appliance-3.2.1.1.0.20115694-rollback/scripts/run_db_restore.py",
        "pid": 5467,
        "return_code": 255,
        "start_time": "<date/time>",
        "state": "TASK_FAILURE",
        "status_file": "Starting db restore\nDb restore complete.\nStarting running corfu compactor\nError running corfu compactor. rc: 255,

NOTE: The preceding log excerpts are only examples. Date, time and environmental variables may vary depending on your environment.

Environment

VMware NSX 4.1.0
VMware NSX-T Data Center 3.x
VMware NSX-T Data Center

Cause

This issue is caused by a Corfu table present in NSX 4.1 but not in NSX-T 3.2.1.x.

When the back-up is taken after the upgrade coordinator is upgraded the new table is referenced. When the rollback process tries to restore it fails as the table does not exist in the source version.

Resolution

This is resolved in NSX version 4.1.1

Workaround:

If you believe you have encountered this issue, please open a support request with Broadcom Support and refer to this KB article.