New Federation sites cannot sync with existing local site that was upgraded through a rolling upgrade.
book
Article ID: 322652
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
Symptoms:
VMware NSX-T 3.2.x / 4.x
Local site was recently upgraded using the rolling upgrade process.
New Federation sites cannot sync with upgraded local site.
Possible scenarios:
Scenario 1: Site A is not currently onboarded to Federation. Site A is upgraded (rolling upgrade). Site A is onboarded after rolling upgrade. Site A cannot be synced with any other Sites already onboarded.
Scenario 2: Site B is not onboarded to Federation, while site A is onboarded to Federation. Site A is rolling-upgraded. Site B is onboarded after site A is rolling upgraded. Site B cannot be connected to site A.
On the local site that has upgraded via rolling upgrade, you will see the following logs from /var/log/cloudnet/nsx-ccp.log, that the new site is added but the state of it is closed:
2023-01-24T23:45:21.848Z INFO nsx-rpc:CCP-AphProvider-a2ffa5b0-####-####-####-########12f:user-executor-3 SiteSyncManager 3532 - [nsx@6876 comp="nsx-controller" level="INFO" subcomp="sitesync"] Remote site added a32ec9ab-####-####-####-########16f with APH [4420e2e4-####-####-####-########8b5, 976b13e2-####-####-####-########c2e, d2fcc4bb-####-####-####-########22e] ... 2023-01-24T23:45:21.848Z INFO nsx-rpc:CCP-AphProvider-a2ffa5b0-####-####-####-########12f:user-executor-3 SiteSyncManager 3532 - [nsx@6876 comp="nsx-controller" level="INFO" subcomp="sitesync"] State for site a32ec9ab-####-####-####-########16f is CLOSED
Environment
VMware NSX-T Data Center 3.x VMware NSX-T Data Center VMware NSX-T Data Center 4.x
Cause
This is caused due to an incorrect flag being set during the rolling upgrade, causing the manager node of the site to drop handshake requests coming from other sites.
Workaround: Restart the controller service on all manager nodes on the site that had the rolling upgrade
On each NSX Manager node as root user:
root@nsx-mngr-01:~# service nsx-ccp restart
Note: Perform the above step on each manager node one-by-one, to ensure the controller cluster stays up and check the cluster status using get cluster status before proceeding to the next NSX-T manager.