/var/log/proton/nsxapi.log<timestamp> INFO RepoSyncThread-1707748646882 RepoSyncServiceImpl 4841 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Starting Repo sync thread RepoSyncThread-12345678964321<timestamp> INFO RepoSyncThread-1707748646882 RepoSyncFileHelper 4841 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Command to get server info for https://#.#.#.#:443/repository/4.1.1.0.0.22224312/HostComponents/rhel77_x86_64_baremetal_server/upgrade.sh returned result CommandResultImpl [commandName=null, pid=2227086, status=SUCCESS, errorCode=0, errorMessage=null, commandOutput=HTTP/1.1 404 Not Found<timestamp> INFO RepoSyncThread-1707748646882 RepoSyncFileHelper 4841 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Command to check if remote file exists for https://#.#.#.#:443/repository/4.1.1.0.0.22224312/Manager/vmware-mount/libvixMntapi.so.1 returned result CommandResultImpl [commandName=null, pid=2228965, status=SUCCESS, errorCode=0, errorMessage=null, commandOutput=HTTP/1.1 404 Not Found<timestamp> ERROR RepoSyncThread-1707748646882 RepoSyncServiceImpl 4841 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP21057" level="ERROR" subcomp="manager"] Unable to start repository sync operation.See logs for more details.
"Upgrade-coordinator upgrade failed. Error - Repository Sync status is not success on node <node IP>."<timestamp> NSX_Manager NSX 98866 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP30487" level="ERROR" subcomp="upgrade-coordinator"] Repository sync is not successful on <Managers IPs>. Please ensure Repository Sync Status is successful on all MP cluster nodes.2024-02-24T12:00:52.800Z NSX_Manager NSX 98866 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP30040" level="ERROR" subcomp="upgrade-coordinator"] Error while updating upgrade-coordinator due to error Repository Sync status is not success on node <Managers IPs>. Please ensure Repository Sync status is success on all MP nodes before proceeding..
<timestamp> INFO RepoSyncResultTsdbListener-2-1 RepoSyncResultTsdbListener 5032 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] perform FullSync in RepoSyncResultTsdbListener, repoSyncResultMsg managed_resource {
}
status: REPO_SYNC_STATUS_FAILED
status_message {
}failure_message {
value: "Unable to connect to File /repository/4.2.1.0.0.24304122/Manager/dry-run/dry_run.py on source <Manager IP>. Please verify that file exists on source and install-upgrade service is up."
}
error_code: 21057
VMware NSX-T Data Center
VMware NSX
This is a known issue impacting VMware NSX. It is due to missing files within the /repository directory within each NSX Manager.
This issue is resolved for upgrades from VMware NSX 4.2.0 to higher versions.
Workaround:
Warning this procedure involves the use of the "rm" command which irreversibly removes files from the system.
Ensure backups are taken and restore passphrase is known before proceeding.
Identifying the issue:
On each VMware NSX Manager Appliance, check which directories are present in the /repository directory:
As root user run: ls -l /repository
You may see either of the 3 below:
drwxrwx--- 7 uuc grepodir 4096 <date> 4.1.0.0.0.21332672drwxrwx--- 7 uuc grepodir 4096 <date> 4.1.1.0.0.22224312
Based on the above results, you will need to then complete one or more of the below options:
Option: Correcting User and Group permission recursively for the /repository directory after copying (scp) it from a know good source manager.
The user and group the whole of the /repository directory should be user: uuc and group: grepodir for the directory and all subdirectories and files.
The permission should be wrx wrx.
This was not the case when the directory was copied with scp to the newly replaced manager/s.
To ensure the correct user, group, and permission the following command is executed at the cli of each replacement manager.
Copy /repository directory to new manager.
Open an SSH session to the known good host.
#scp -r /repository <remote User>@<IP of Remote Server>:/
Example command:
#scp -r /repository [email protected]:/
This command copies the /repository directory recursively to the root directory (/) of host A.B.C.D.
Now the user, group, and permission will need to be check and corrected.
This will recursively set the user and group:
#chown -R uuc:grepodir /repository
This will recursively set the required permissions:
#chmod -R 770 /repository
Example:
The cannot connect to dry-run.py error was corrected by setting these attributes
Check that the REPO_SYNC FAIL state has been cleared.
Option: Deploy MUB file in /repository:
nsx-mngr> get service install-upgrade
Service name: install-upgrade
Service state: stopped
Enabled on: #.#.#.# <<< orchestrator node
/image directory of orchestrator node.# cd /image
# tar -xf VMware-NSX-upgrade-bundle-<version>.mub
/repository. # rm -rf /repository/4.1.1.0.0.22224312/repository
# tar -xzf /image/VMware-NSX-upgrade-bundle-<version>.tar.gz -C /repository/opt/vmware/proton-tomcat/bin/reposync_helper.sh/image:rm -f /image/VMware-NSX-upgrade-bundle-<version>.mub
rm -f /image/VMware-NSX-upgrade-bundle-<version>.tar.gz
rm -f /image/VMware-NSX-upgrade-bundle-<version>.tar.gz.sig
Option: Deploy OVA file in /repository:
nsx-unified-appliance-<version>.ova file following these instructions: Download Broadcom products and software. The downloaded version should match the version missing in the repository as identified above from the 'Identifying the issue' section./repository/<version> directory to all 3 existing managers missing the directory./opt/vmware/proton-tomcat/bin/reposync_helper.sh on all the 3 existing managers, not the newly deployed one.Option: Advanced LB (AVI):
2024-03-19T09:41:34.557Z INFO RepoSyncThread-1710841232019 RepoSyncFileHelper 85527 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Command to get server info for https://#.#.#.#:443/repository/21.1.2-9124/Alb_controller/ovf/controller.cert returned result CommandResultImpl [commandName=null, pid=1677285, status=SUCCESS, errorCode=0, errorMessage=null, commandOutput=HTTP/1.1 404 Not Found2024-03-19T09:42:08.746Z INFO RepoSyncThread-1710841232019 RepoSyncFileHelper 85527 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Command to get server info for https://#.#.#.#:443/repository/22.1.6-9191/Alb_controller/ovf/controller-disk1.vmdk returned result CommandResultImpl [commandName=null, pid=1677876, status=SUCCESS, errorCode=0, errorMessage=null, commandOutput=HTTP/1.1 404 Not Found
/var/log/proton/nsxapi.log
2024-05-29T14:32:15.898Z INFO http-nio-127.0.0.1-7440-exec-23 RepoSyncServiceImpl 117206 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" reqId="<UUID>" subcomp="manager" username="uproton"] Starting Repository sync process, current result is RepoSyncResult [nodeId=<NODE UUID>, status=FAILED, statusMessage=, failureMessage=Unable to connect to File /repository/21.1.2-9124/Alb_controller/ovf/controller.ovf on source #.#.#.#. Please verify that file exists on source and install-upgrade service is up., errorCode=21057, percentage=0.0]
#mkdir /repository/21.1.2-9124/Alb_controller/ovf# tar -xvf /image/Controller.ova -C /repository/21.1.2-9124 controller.ovf controller.mf controller.cert controller-disk1.vmdk
/opt/vmware/proton-tomcat/bin/reposync_helper.shAlternate option for ALB controller ova file if the customer does not intend to use ALB:
The ALB controller file check can be bypassed during Repo sync by resetting the AlbControllerVmFabricModule values to default following the below steps:
# rm -rf /repository/21.1.2-9124GET https://<nsx-manager-ip>/api/v1/fabric/modules >>>> note the ID present in section: "fabric_module_name" : "AlbControllerVmFabricModule",GET https://<nsx-manager-ip>/api/v1/fabric/modules/<alb_fabric_id>PUT https://<nsx-manager-ip>/api/v1/fabric/modules/<alb-fabric-id> along with adding the header "Content-Type:Application.Json"{"fabric_module_name" : "AlbControllerVmFabricModule", "current_version" : "1.0", "deployment_specs" : [ { "fabric_module_version" : "1.0", "versioned_deployment_specs" : [ { "host_version" : "", "service_vm_ovf_url" : [ "ALB_CONTROLLER_OVF" ], "host_type" : "ESXI" } ] } ], "source_authentication_mode" : "NO_AUTHENTICATION", "disk_provisioning" : "THIN", "resource_type" : "FabricModule", "id" : "######-####-####-####-##########", "display_name" : "######-####-####-####-##########", "_revision" : 1}'If you are contacting Broadcom support about this issue, please provide the following:
Handling Log Bundles for offline review with Broadcom support