Alarm 'Data synchronization between sites <sites> failed for the FlowIdentifier' triggered post Federation Global Manager failover

search cancel

Alarm 'Data synchronization between sites <sites> failed for the FlowIdentifier' triggered post Federation Global Manager failover

book

Article ID: 415391

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

A failover from Active Global Manager (GM) was triggered to the standby GM.
Once the failover is complete, the following alarm triggered in the NSX UI:

Data synchronization between sites <Local Manager Site>(########-26b2-426d-93b3-############) and <Now Standby GM>(########-52ab-4ef3-ad8f-############) failed for the FlowIdentifier{role='Policy', nameSpace='LM_2_GM_NOTIFICATION'}. Reason: Remote site disconnected

The alarm can be seen in the Local Manager (LM) log /var/log/async-replicator/ar.log

WARN EventReportProcessor-1-2 EventReportSyslogSender 78425 MONITORING [nsx@6876 comp="nsx-manager" entId="########-586d-37cb-a509-############" eventFeatureName="federation" eventSev="warning" eventState="On" eventType="gm_to_lm_synchronization_warning" level="WARNING" subcomp="async-replicator"] Data synchronization between sites LM-SITE(########-26b2-426d-93b3-############) and GM-SITE(########-52ab-4ef3-ad8f-############) failed for the FlowIdentifier{role='Policy', nameSpace='LM_2_GM_NOTIFICATION'}. Reason: Remote site disconnected

The steps to trigger the failover were:
1. Shut down the current Active GM
2. Login to the current Standby GM and make it active
3. Power on the now standby GM and make standby
4. Alarm triggers

Environment

VMware NSX

Cause

When the Active GM was powered down, the alarm will trigger, but after the switchover completes, the alarm should be cleared.

This is a known issue impacting VMware NSX.

Resolution

To workaround the issue, carry out a rolling reboot of the LM cluster first, then the now standby GM cluster, the one which is referenced in the alarm.

Rolling reboot steps for a cluster of 3 NSX managers:

Identify the manager that has the VIP, this can be seen in System, Appliances.
Log in as admin to VIP manager and run 'get cluster status' and ensure all services are in an UP state.
Reboot one of the other 2 managers.
Once the reboot is complete, run 'get cluster status' again and wait until all services are in an UP state again.
Reboot the other none VIP manager.
Once the reboot is complete, run 'get cluster status' again and wait until all services are in an UP state again.
Then reboot the VIP manager.
Once the reboot is complete, run 'get cluster status' again and wait until all services are in an UP state again.

Note: Leaving the VIP manager to last, prevents unnecessary VIP failovers to occur.

Feedback

thumb_up Yes

thumb_down No