Split brain condition after a Global Manager failover
search cancel

Split brain condition after a Global Manager failover


Article ID: 324244


Updated On:


VMware NSX Networking


  •  NSX-T 3.2.2
  •  Both Global Manager clusters are reported as Active
  •  UI may report sync status as Not Started
  •  NSX UI raises an alarm "GM To GM Split Brain"
  •  The following log is observed
2022-12-15T22:41:55.979Z  WARN nsx-rpc:APH_provider:user-executor-2 sitemanager 3763 - [nsx@6876 comp="global-manager" level="WARNING" subcomp="async-replicator"] Split brain detected, raising alarm.


2022-12-15T22:43:37.236Z hostname NSX 3142 MONITORING [nsx@6876 alarmId="11cb4447-7e32-41a6-980d-eb1dac7039fc" alarmState="OPEN" comp="global-manager" entId="535f2ecd-a292-44b9-8ade-f3f4666336d4" errorCode="MP701099" eventFeatureName="federation" eventSev="CRITICAL" eventState="On" eventType="gm_to_gm_split_brain" level="FATAL" nodeId="19a10f42-20e0-836d-e360-71f6fa6b1838" subcomp="monitoring"] Multiple Global Manager nodes are active: 425f2ecd-a292-44b9-8ade-f3f4666336d4,9e0d4226-8612-4d80-894f-7b80a3e3935d. Only one Global Manager node must be active at any time.


VMware NSX-T Data Center 3.x
VMware NSX
VMware NSX-T Data Center


The condition of a split brain occurs when 2 Global Managers believe they are active and have the same epoch. In this case this occurs due to a race condition handling site configuration updates.


This issue is resolved in NSX 3.2.3 available from the VMware Customer Connect portal.

GM Site 1
GM Site 2

First determine the current state on both GMs.

In this example we have verified that site1 should be ACTIVE and the following proccedure is used to reset the state of site2

1) remove extra resource (not doing anything from site2) on site2 GM:
DELETE https://site2/global-manager/api/v1/global-infra/global-managers/site1

2) site2 is changed from ACTIVE to STANDBY using internal API (and do NOT change any field name as it is intentional to send the request exactly in this manner: 
ssh as root user to site2 GM (This API is internal and must be run directly on the GM: 
curl -X POST -ik http://localhost:7441/api/v1/sites?action=set_global_manager -H "Content-Type: application/json" -d '{"status":"STANDBY","force":false,"federation_id":"","gm_name":""}'

If this does not work the force option can be tried
curl -X POST -ik http://localhost:7441/api/v1/sites?action=set_global_manager -H "Content-Type: application/json" -d '{"status":"STANDBY","force":true,"federation_id":"","gm_name":""}'

3) On site1 Active Site, from the UI onboard the site2 GM to STANDBY