New Federation sites cannot sync with existing local site that was upgraded through a rolling upgrade.

search cancel

New Federation sites cannot sync with existing local site that was upgraded through a rolling upgrade.

book

Article ID: 322652

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

VMware NSX-T 3.2.x / 4.x
Local site was recently upgraded using the rolling upgrade process.
New Federation sites cannot sync with upgraded local site.
Possible scenarios:
- Scenario 1:
  Site A is not currently onboarded to Federation.
  Site A is upgraded (rolling upgrade).
  Site A is onboarded after rolling upgrade.
  Site A cannot be synced with any other Sites already onboarded.
- Scenario 2:
  Site B is not onboarded to Federation, while site A is onboarded to Federation.
  Site A is rolling-upgraded.
  Site B is onboarded after site A is rolling upgraded.
  Site B cannot be connected to site A.
On the local site that has upgraded via rolling upgrade, you will see the following logs from /var/log/cloudnet/nsx-ccp.log, that the new site is added but the state of it is closed:

2023-01-24T23:45:21.848Z INFO nsx-rpc:CCP-AphProvider-a2ffa5b0-####-####-####-########12f:user-executor-3 SiteSyncManager 3532 - [nsx@6876 comp="nsx-controller" level="INFO" subcomp="sitesync"] Remote site added a32ec9ab-####-####-####-########16f with APH [4420e2e4-####-####-####-########8b5, 976b13e2-####-####-####-########c2e, d2fcc4bb-####-####-####-########22e]
...
2023-01-24T23:45:21.848Z INFO nsx-rpc:CCP-AphProvider-a2ffa5b0-####-####-####-########12f:user-executor-3 SiteSyncManager 3532 - [nsx@6876 comp="nsx-controller" level="INFO" subcomp="sitesync"] State for site a32ec9ab-####-####-####-########16f is CLOSED

Environment

VMware NSX-T Data Center 3.x
VMware NSX-T Data Center
VMware NSX-T Data Center 4.x

Cause

This is caused due to an incorrect flag being set during the rolling upgrade, causing the manager node of the site to drop handshake requests coming from other sites.

Resolution

This is resolved in NSX-T version 3.2.3 available at Support Documents and Downloads (broadcom.com).
This is a known issue impacting NSX-T 4.x.

Workaround:
Restart the controller service on all manager nodes on the site that had the rolling upgrade

On each NSX Manager node as root user:

root@nsx-mngr-01:~# service nsx-ccp restart

Note: Perform the above step on each manager node one-by-one, to ensure the controller cluster stays up and check the cluster status using get cluster status before proceeding to the next NSX-T manager.

Feedback

thumb_up Yes

thumb_down No