One Manager is the orchestrator node and initiates and controls the upgrade.
The Upgrade UI is accessible only from the orchestrator node
During the upgrade, the corfu database is shrunk back to be active only on the Orchestrator node.
This upgrade model results in an expected MP outage where API and UI become unavailable.
If one Manager upgrade fails then the MP cluster is unavailable.
Upgrades from NSX-T 3.2.1 and higher versions
The Upgrade UI is accessible from any Manager node.
A rolling upgrade model is used.
This model provides reduced downtime of the NSX Management Plane (MP).
The maintenance window for MP upgrade gets shortened, and NSX MP API/UI access is up throughout the upgrade process while not impacting Data Plane workloads.
If an NSX Manager upgrade fails, the other 2 Managers remain accessible and MP is up.
Environment
VMware NSX VMware NSX-T
Resolution
Manager admin cli to check status:
get upgrade progress-status
Rollback:
For upgrades from version 3.2.1 and later, a Manager upgrade failure can be rolled back using these documented steps: Upgrade Guide Rollback steps
For upgrade from a version earlier than NSX-T 3.2.1, Broadcom Support must be engaged to perform rollback steps
Alternatively, restore from backup is an option to recover from a Manager upgrade failure
NSX 4.x Restore from backup during upgrade, if upgraded from 3.2.1 or later an automatic internal backup is taken prior to Manager upgrade
Logs:
On the Manager /var/log/upgrade-coordinator/upgrade-coordinator.log is the main upgrade log
To determine RCA for an NSX Manager upgrade failure, NSX Manager log bundle will be required from all 3 nodes.
If performing a rollback, logs can be collected afterwards.
If performing a restore from backup, logs must be collected first. If the UI is not not available, logs can be collected from commandline as admin user get support-bundle file <filename.tgz>