Troubleshooting NSX Manager Failures
search cancel

Troubleshooting NSX Manager Failures

book

Article ID: 379034

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Upgrades from versions earlier than NSX-T 3.2.1

  • One Manager is the orchestrator node and initiates and controls the upgrade.
  • The Upgrade UI is accessible only from the orchestrator node
  • During the upgrade, the corfu database is shrunk back to be active only on the Orchestrator node.
  • This upgrade model results in an expected MP outage where API and UI become unavailable.
  • If one Manager upgrade fails then the MP cluster is unavailable.

Upgrades from NSX-T 3.2.1 and higher versions

  • The Upgrade UI is accessible from any Manager node.
  • A rolling upgrade model is used.
  • This model provides reduced downtime of the NSX Management Plane (MP).
  • The maintenance window for MP upgrade gets shortened, and NSX MP API/UI access is up throughout the upgrade process while not impacting Data Plane workloads.
  • If an NSX Manager upgrade fails, the other 2 Managers remain accessible and MP is up.

Environment

VMware NSX
VMware NSX-T

Resolution

Manager admin cli to check status:

  • get upgrade progress-status



Rollback:

  • For upgrades from version 3.2.1 and later, a Manager upgrade failure can be rolled back using these documented steps: Upgrade Guide Rollback steps
  • For upgrade from a version earlier than NSX-T 3.2.1, Broadcom Support must be engaged to perform rollback steps
  • Alternatively, restore from backup is an option to recover from a Manager upgrade failure
  • NSX-T 3.x Restore from backup during upgrade
  • NSX 4.x Restore from backup during upgrade, if upgraded from 3.2.1 or later an automatic internal backup is taken prior to Manager upgrade

 

Logs:

  • On the Manager /var/log/upgrade-coordinator/upgrade-coordinator.log is the main upgrade log
  • To determine RCA for an NSX Manager upgrade failure, NSX Manager log bundle will be required from all 3 nodes.
  • If performing a rollback, logs can be collected afterwards.
  • If performing a restore from backup, logs must be collected first. If the UI is not not available, logs can be collected from commandline as admin user get support-bundle file <filename.tgz>

 

Known issues:

Additional Information