/var/log/proton/nsxapi.log
2024-02-24T12:00:26.882Z INFO RepoSyncThread-1707748646882 RepoSyncServiceImpl 4841 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Starting Repo sync thread RepoSyncThread-12345678964321
2024-02-24T12:00::32.208Z INFO RepoSyncThread-1707748646882 RepoSyncFileHelper 4841 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Command to get server info for https://#.#.#.#:443/repository/4.1.1.0.0.22224312/HostComponents/rhel77_x86_64_baremetal_server/upgrade.sh returned result CommandResultImpl [commandName=null, pid=2227086, status=SUCCESS, errorCode=0, errorMessage=null, commandOutput=HTTP/1.1 404 Not Found
2024-02-24T12:00::11.583Z INFO RepoSyncThread-1707748646882 RepoSyncFileHelper 4841 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Command to check if remote file exists for https://#.#.#.#:443/repository/4.1.1.0.0.22224312/Manager/vmware-mount/libvixMntapi.so.1 returned result CommandResultImpl [commandName=null, pid=2228965, status=SUCCESS, errorCode=0, errorMessage=null, commandOutput=HTTP/1.1 404 Not Found
2024-02-24T12:00::11.583Z ERROR RepoSyncThread-1707748646882 RepoSyncServiceImpl 4841 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP21057" level="ERROR" subcomp="manager"] Unable to start repository sync operation.See logs for more details.
"Upgrade-coordinator upgrade failed. Error - Repository Sync status is not success on node <node IP>."
2024-02-24T12:00:52.800Z NSX_Manager NSX 98866 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP30487" level="ERROR" subcomp="upgrade-coordinator"] Repository sync is not successful on <Managers IPs>. Please ensure Repository Sync Status is successful on all MP cluster nodes.2024-02-24T12:00:52.800Z NSX_Manager NSX 98866 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP30040" level="ERROR" subcomp="upgrade-coordinator"] Error while updating upgrade-coordinator due to error Repository Sync status is not success on node <Managers IPs>. Please ensure Repository Sync status is success on all MP nodes before proceeding..
2025-01-07T21:34:50.640Z INFO RepoSyncResultTsdbListener-2-1 RepoSyncResultTsdbListener 5032 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] perform FullSync in RepoSyncResultTsdbListener, repoSyncResultMsg managed_resource {
}
status: REPO_SYNC_STATUS_FAILED
status_message {
}failure_message {
value: "Unable to connect to File /repository/4.2.1.0.0.24304122/Manager/dry-run/dry_run.py on source <Manager IP>. Please verify that file exists on source and install-upgrade service is up."
}
error_code: 21057
VMware NSX 4.1.0
VMware NSX 4.2
VMware NSX-T Data Center 3.2.x
This is a known issue impacting VMware NSX. It is due to missing files within the /repository
directory within each NSX Manager.
This issue is resolved in VMware NSX 4.2.0
Workaround:
Warning this procedure involves the use of the "rm
" command which irreversibly removes files from the system.
Ensure backups are taken and restore passphrase is known before proceeding.
Identifying the issue:
On each VMware NSX Manager Appliance, check which directories are present in the /repository
directory:
As root user run: ls -l /repository
You may see either of the 3 below:
drwxrwx--- 7 uuc grepodir 4096 <date> 4.1.0.0.0.21332672
drwxrwx--- 7 uuc grepodir 4096 <date> 4.1.1.0.0.22224312
Based on the above results, you will need to then complete one or more of the below options:
Option: Correcting User and Group permission recursively for the /repository directory after coping (scp) it from a know good source manager.
The user and group the whole of the /repository directory should be user: uuc
and group
: grepodir for the directory and all subdirectories and files.
The permission should be wrx wrx
.
This is was not the case when the directory was copied with scp to the newly replaced manager/s.
To ensure the correct user, group, and permission the following command is executed at the cli of each replacement manager.
Copy /repository directory to new manager.
Open an SSH session to the known good host.
#scp -r /repository <remote User>@<IP of Remote Server>:/
Example command:
#scp -r /repository [email protected]:/
This command copies the /repository
directory recursively to the root directory (/) of host A.B.C.D.
Now the user, group, and permission will need to be check and corrected.
This will recursively set the user and group:
#chown -R uuc:grepodir /repository
This will recursively set the required permissions:
#chmod -R 770 /repository
Example:
The cannot connect to dry-run.py error was corrected by setting these attributes
Check that the REPO_SYNC FAIL
state has been cleared.
Option: Deploy MUB file in /repository
:
nsx-mngr> get service install-upgrade
Service name: install-upgrade
Service state: stopped
Enabled on: #.#.#.# <<< orchestrator node
/image
directory of orchestrator node.# cd /image
# tar -xf VMware-NSX-upgrade-bundle-<version>.mub
/repository
. # rm -rf /repository/4.1.1.0.0.22224312
/repository
# tar -xzf /image/VMware-NSX-upgrade-bundle-<version>.tar.gz -C /repository
/opt/vmware/proton-tomcat/bin/reposync_helper.sh
/image
:rm -f /image/VMware-NSX-upgrade-bundle-<version>.mub
rm -f /image/VMware-NSX-upgrade-bundle-<version>.tar.gz
rm -f /image/VMware-NSX-upgrade-bundle-<version>.tar.gz.sig
Option: Deploy OVA file in /repository
:
nsx-unified-appliance-<version>.ova
file following these instructions: Download Broadcom products and software. The downloaded version should match the version missing in the repository as identified above from the 'Identifying the issue' section./repository/<version>
directory to all 3 existing managers missing the directory./opt/vmware/proton-tomcat/bin/reposync_helper.sh
” on all the 3 existing managers, not the newly deployed one.Option: Advanced LB (AVI):
2024-03-19T09:41:34.557Z INFO RepoSyncThread-1710841232019 RepoSyncFileHelper 85527 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Command to get server info for https://#.#.#.#:443/repository/21.1.2-9124/Alb_controller/ovf/controller.cert returned result CommandResultImpl [commandName=null, pid=1677285, status=SUCCESS, errorCode=0, errorMessage=null, commandOutput=HTTP/1.1 404 Not Found
2024-03-19T09:42:08.746Z INFO RepoSyncThread-1710841232019 RepoSyncFileHelper 85527 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Command to get server info for https://#.#.#.#:443/repository/22.1.6-9191/Alb_controller/ovf/controller-disk1.vmdk returned result CommandResultImpl [commandName=null, pid=1677876, status=SUCCESS, errorCode=0, errorMessage=null, commandOutput=HTTP/1.1 404 Not Found
/var/log/proton/nsxapi.log
2024-05-29T14:32:15.898Z INFO http-nio-127.0.0.1-7440-exec-23 RepoSyncServiceImpl 117206 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" reqId="<UUID>" subcomp="manager" username="uproton"] Starting Repository sync process, current result is RepoSyncResult [nodeId=<NODE UUID>, status=FAILED, statusMessage=, failureMessage=Unable to connect to File /repository/21.1.2-9124/Alb_controller/ovf/controller.ovf on source #.#.#.#. Please verify that file exists on source and install-upgrade service is up., errorCode=21057, percentage=0.0]
#mkdir /repository/21.1.2-9124/Alb_controller/ovf
# tar -xvf /image/Controller.ova -C /repository/21.1.2-9124
controller.ovf
controller.mf
controller.cert
controller-disk1.vmdk
/opt/vmware/proton-tomcat/bin/reposync_helper.sh
Alternate option for ALB controller ova file if the customer does not intend to use ALB:
The ALB controller file check can be bypassed during Repo sync by resetting the AlbControllerVmFabricModule values to default following the below steps:
# rm -rf /repository/21.1.2-9124
GET https://<nsx-manager-ip>/api/v1/fabric/modules
>>>> note the ID present in section: "fabric_module_name" : "AlbControllerVmFabricModule",
GET https://<nsx-manager-ip>/api/v1/fabric/modules/<alb_fabric_id>
PUT https://<nsx-manager-ip>/api/v1/fabric/modules/<alb-fabric-id> along with adding the header "Content-Type:Application.Json"
{
"fabric_module_name" : "AlbControllerVmFabricModule",
"current_version" : "1.0",
"deployment_specs" : [ {
"fabric_module_version" : "1.0",
"versioned_deployment_specs" : [ {
"host_version" : "",
"service_vm_ovf_url" : [ "ALB_CONTROLLER_OVF" ],
"host_type" : "ESXI"
} ]
} ],
"source_authentication_mode" : "NO_AUTHENTICATION",
"disk_provisioning" : "THIN",
"resource_type" : "FabricModule",
"id" : "######-####-####-####-##########",
"display_name" : "######-####-####-####-##########
",
"_revision" : 1
}'
If all Options still fail to bring REPO_SYNC to SUCCESS state try below WA
1. Reset Upgrade Plan on ALL 3 Manager nodes
2. Check the upgrade status using API
3. Delete Upgrade plan from ALL 3 managers and VIP IP's
DELETE https://<NSX_MGR1>/api/v1/upgrade-mgmt/plan
DELETE https://<NSX_MGR2>/api/v1/upgrade-mgmt/plan
DELETE https://<NSX_MGR3>/api/v1/upgrade-mgmt/plan
DELETE https://<NSX_MGR(VIP-IP)>/api/v1/upgrade-mgmt/plan
4. Confirm /repository only displays the Current Version of NSX on each manager node
5. WinSCP Target Version MUB file to /image directory on ALL Managers
6. Extract MUB files on ALL Manager nodes
# cd /image
# tar -xf VMware-NSX-upgrade-bundle-<version>.mub
This will create a new file with the same name and .tar.gz extension.
7. Extract tar.gz to /repository
# tar -xzf /image/VMware-NSX-upgrade-bundle-<version>.tar.gz -C /repository
8. Change permissions of extracted bundle in /repository on ALL manager nodes
# chmod -R 777 4.1.1.0.0.22224312
9. Set proper permissions and ownership of the /repository files by executing the following/opt/vmware/proton-tomcat/bin/reposync_helper.sh
10. From the UI Resolve the REPO_SYNC on the orchestrator node: System -> Appliances -> View Details and click Resolve for REPO_SYNC and wait for this to complete.
11. Once completed, press Resolve for each of the other 2 Managers.
12. Clean up the downloaded mub file and extracted tar.gz file from /image
:
rm -f /image/VMware-NSX-upgrade-bundle-<version>.mub
rm -f /image/VMware-NSX-upgrade-bundle-<version>.tar.gz
rm -f /image/VMware-NSX-upgrade-bundle-<version>.tar.gz.sig
If you are contacting Broadcom support about this issue, please provide the following:
Handling Log Bundles for offline review with Broadcom support