NSX Active Global manager upgrade failing prechecks after Standby GM cluster upgraded first (Error code: 530054)
search cancel

NSX Active Global manager upgrade failing prechecks after Standby GM cluster upgraded first (Error code: 530054)

book

Article ID: 427681

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

In environments where both the Active and Standby Global Manager (GM) clusters are deployed as single-node clusters, you may encounter upgrade workflow failures after upgrading the Standby GM.

After successfully upgrading the Standby GM and initiating prechecks on the Active GM to proceed with its upgrade, the following error appears:

“Active Global Manager site cannot be upgraded first. Please upgrade Standby Global Manager before performing upgrade of this site.”

From the Active GM UI, the Standby GM displays a Sync Status of “Not Available.” Attempts to edit the Standby GM configuration under Location Manager—including re-adding credentials and thumbprint—fail with the error:

“Active GM version 4.1.2.3.0.23382420 is not in Standby GM compatible versions 4.2. (Error code: 530054).”

Although the Standby GM upgrade completes successfully, the Active GM continues to recognize the Standby GM as running the older version. API queries (via GET /api/v1/sites) confirm:

  • Standby GM reports the correct upgraded version (e.g., 4.2.2.2), with a higher config_version.

  • Active GM continues to display the Standby GM as running 4.1.2.3, indicating synchronization failure between the sites.

Running manual refresh operations (localhost:7441,7999/api/v1/sites?action=refresh) does not resolve the issue.
Log bundle analysis further shows persistent DISCONNECTED states when the Active GM attempts to communicate with the Standby GM, including repeated “get site leader failed” events.

While running the CARR script may identify and repair trust-store or certificate inconsistencies, these do not resolve the synchronization problem. The Active GM remains unable to recognize the updated Standby GM site version, preventing the upgrade from progressing.

Environment

VMware NSX 4.1.2.3 Federated

Cause

The issue is caused by a broken synchronization state between the Active and Standby Global Manager sites. When reviewing the output of the API call to http://localhost:7441/api/v1/sites, each GM node reports its own view of the site configuration.
The Site Manager (SM) service is responsible for maintaining and propagating this configuration across sites; however, this process depends on a functioning APH (Async Propagation Handler) communication channel.

In single-node GM clusters—which are not a recommended deployment model—a circular dependency occurs:

  • SM relies on the APH channel to exchange APH certificates and configuration updates.

  • When certificates change or expire, APH connectivity drops.

  • With APH down, no configuration exchange occurs between the two sites.

As a result, the Active GM continues to reference the Standby GM at an outdated configuration version (e.g., config_version 18), while the Standby GM correctly sees itself at a newer configuration state (e.g., config_version 50).

Because APH communication is broken, the Active GM never updates its view of the Standby GM’s version, causing persistent synchronization failures and blocking the upgrade workflow.




Resolution

To restore synchronization between the Active and Standby Global Manager sites, the Standby GM must be re-onboarded through the Site Manager service on the Active GM. This operation re-establishes the APH communication channel and allows the Active GM to update its view of the Standby GM’s configuration and version state.

Workaround: Re-Onboard Standby GM via Site Manager

  1. SSH into one of the Active GM nodes as the root user.

  2. Run the following command to re-onboard the Standby GM:

curl -X POST -ik http://localhost:7999/api/v1/sites?action=onboard_site -H "Content-Type: application/json" -d '{"address": "<remote_ip>", "username": "<username>", "password": "<password>", "thumbprint": "<thumbprint>", "site_name": "", "standby_gm":true}'  -H 'X-NSX-Username:admin;<password>' 

Note

All required values (<remote_ip>, <username>, <password>, and <thumbprint>) must be taken from the Standby GM.
Site Manager (SM) is responsible for maintaining consistent site configuration across the federation; however, because the APH connection is broken, normal propagation cannot occur. Re-onboarding the Standby GM through REST forces a refresh of APH certificates and restores communication between the two sites.

Additional Information

This is a known limitation as described above. In 9.0, we already have additional background recovery. There's no plan to backport it to 4.1.2.