When running upgrade prechecks on the Active Global Manager (GM) cluster after successfully upgrading the Standby GM, the following error is displayed:
"Active Global Manager site cannot be upgraded first. Please upgrade Standby Global Manager '' before performing upgrade of this site."
In the Upgrade section, the Standby GM cluster appears as already upgraded to a newer NSX version. However, when viewing the Active GM UI under Location Management, the Standby GM cluster incorrectly shows as being on the same version as the Active GM and not yet upgraded.
Alternatively, when checking from the Standby GM UI, the upgrade status shows as successful, and the Standby cluster is indeed running the newer NSX version compared to the Active cluster.
Attempts to clear this mismatch by running the following commands do not resolve the precheck error:
start search resync all
and
curl -X POST -H "Content-Type: application/json" -H 'X-NSX-Username:admin' http://127.0.0.1:7441/api/v1/sites?action=refresh
Rebooting the Managers also does not fix the issue.
Further investigation using the API command below on both Active and Standby GMs shows that the config_version values remain identical, when the Standby GM’s should be higher after its upgrade:
curl -l -k -u '<username>' -H 'Content-Type:application/json' -X GET http://localhost:7441/api/v1/sites
Active GM output:
"config_version": 16,
"id": "#####",
"is_local": false,
"name": "STANDBY_GM",
"node_type": "GM",
"site_version": "4.1.2.3.0.23382420"
Standby GM output:
"config_version": 16,
"id": "#####",
"is_local": true,
"name": "STANDBY_GM",
"node_type": "GM",
"site_version": "4.2.3.1.0.24954571"
VMware NSX 4.1.2.3
The issue occurs due to the way site configuration updates are handled between Active and Standby Global Managers during the federation upgrade process.
The site config update happens in two stages:
A handshake check is performed to verify whether the Standby GM’s self config_version is greater than the version currently known by the Active GM.
If the check passes (i.e., the Standby GM’s version is higher), an update is triggered on the Active GM to synchronize the configuration data.
This is either because one of the site version updates didn't update the site config version, or there was a restore, and the site config version of the standby was rolled back.
While the refresh API on port 7441 (/api/v1/sites?action=refresh) should normally synchronize the configuration between sites, in this case, the config_version of the Active GM is significantly behind the Standby’s version. Because of this version mismatch, multiple refresh calls are required to bring the site configurations back into sync and allow the upgrade to proceed.
Workaround
To resolve the version synchronization issue and allow the Active GM's upgrade precheck to complete successfully, run the following command on any node in the Standby Global Manager cluster:
curl -X POST -ik http://localhost:7999/api/v1/sites?action=refresh -H 'X-NSX-Username:admin;<admin_password_here>'
This command forces a site configuration refresh through the internal API on port 7999, ensuring that the Active Global Manager recognizes the updated configuration and version of the Standby GM.
After running this command, reattempt the upgrade precheck on the Active Global Manager — it should now complete successfully without displaying the “upgrade Standby first” error.