After replacing Managers or while running Upgrade prechecks, Repo_Sync is Failed
search cancel

After replacing Managers or while running Upgrade prechecks, Repo_Sync is Failed

book

Article ID: 322436

calendar_today

Updated On:

Products

VMware NSX VMware Avi Load Balancer

Issue/Introduction

    • After 1 or more NSX Managers are deployed/redeployed, REPO_SYNC is in Failed state
    • NSX Manager log /var/log/proton/nsxapi.log

      2024-02-24T12:00:26.882Z  INFO RepoSyncThread-1707748646882 RepoSyncServiceImpl 4841 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Starting Repo sync thread RepoSyncThread-12345678964321
      2024-02-24T12:00::32.208Z  INFO RepoSyncThread-1707748646882 RepoSyncFileHelper 4841 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Command to get server info for https://#.#.#.#:443/repository/4.1.1.0.0.22224312/HostComponents/rhel77_x86_64_baremetal_server/upgrade.sh returned result CommandResultImpl [commandName=null, pid=2227086, status=SUCCESS, errorCode=0, errorMessage=null, commandOutput=HTTP/1.1 404 Not Found
      2024-02-24T12:00::11.583Z  INFO RepoSyncThread-1707748646882 RepoSyncFileHelper 4841 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Command to check if remote file exists for https://#.#.#.#:443/repository/4.1.1.0.0.22224312/Manager/vmware-mount/libvixMntapi.so.1 returned result CommandResultImpl [commandName=null, pid=2228965, status=SUCCESS, errorCode=0, errorMessage=null, commandOutput=HTTP/1.1 404 Not Found
      2024-02-24T12:00::11.583Z ERROR RepoSyncThread-1707748646882 RepoSyncServiceImpl 4841 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP21057" level="ERROR" subcomp="manager"] Unable to start repository sync operation.See logs for more details.
    • While preparing for an upgrade the Check Upgrade Readiness UI shows an error
    • "Upgrade-coordinator upgrade failed. Error - Repository Sync status is not success on node <node IP>."
      "Repository sync is not complete"
    • NSX Manager log /var/log/syslog

      2024-02-24T12:00:52.800Z NSX_Manager NSX 98866 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP30487" level="ERROR" subcomp="upgrade-coordinator"] Repository sync is not successful on <Managers IPs>. Please ensure Repository Sync Status is successful on all MP cluster nodes.

      2024-02-24T12:00:52.800Z NSX_Manager NSX 98866 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP30040" level="ERROR" subcomp="upgrade-coordinator"] Error while updating upgrade-coordinator due to error Repository Sync status is not success on node <Managers IPs>. Please ensure Repository Sync status is success on all MP nodes before proceeding..

Environment

  • VMware NSX 4.1.0
  • VMware NSX-T 3.2.x

Cause

  • This is a known issue impacting VMware NSX. It is due to missing files within the /repository directory within each NSX Manager. 

Resolution

This issue is resolved in VMware NSX 4.2.0

Workaround:

Warning this procedure involves the use of the "rm" command which irreversibly removes files from the system.
Ensure backups are taken and restore passphrase is known before proceeding.


Identifying the issue:

On each VMware NSX Manager Appliance, check which directories are present in the /repository directory:
As root user run: ls -l /repository
You may see either of the 3 below:

  • If the environment has been upgraded, then you expect to see a from and to version directory structure, that is a directory with the previous VMware NSX version as the name and a directory with the current VMware NSX version as the name, for example:
    • drwxrwx--- 7 uuc grepodir 4096 <date> 4.1.0.0.0.21332672
    • drwxrwx--- 7 uuc grepodir 4096 <date> 4.1.1.0.0.22224312
       
  • If the environment has not been upgraded, then you expect to see a from version directory structure, that is a directory with the current VMware NSX version as the name, for example:
    • drwxrwx--- 7 uuc grepodir 4096 <date> 4.1.0.0.0.21332672
  • In some instance there may be no VMware NSX directory version in the repository.


Based on the above results, you will need to then complete one or more of the below options:

  1. If the environment was freshly deployed and not upgraded and the from VMware NSX directory is missing, you need to complete the steps in 'Option: Deploy OVA file in /repository' below.
  2. If the environment was upgraded and the from version is missing, you need to use use the steps in 'Option: Deploy MUB file in /repository' below.
  3. If the environment was upgraded and the to VMware NSX directory is missing, you need to use use the steps in 'Option: Deploy MUB file in /repository' below.
  4. If the environment was upgraded and the to and from VMware NSX directories are missing, you need to complete the steps in 'Deploy MUB file in /repository' below and 'Option: Deploy OVA file in /repository' below.
  5. If the required files are present for both to and from versions but have been replaced incorrectly there may only be missing permissions; In this case follow the 'Deploy MUB file in /repository' guide below from step 8 onwards.

Option: Deploy MUB file in /repository:

  1. Download VMware-NSX-upgrade-bundle-<version>.mub MUB file following these instructions: Download Broadcom products and software
       The downloaded version should match the version reported NOT found in the logs, in this example 4.1.1.0.0.22224312.
  2. To identify the Orchestrator node, log into any Manager as admin and run: 

    nsx-mngr> get service install-upgrade
    Service name:      install-upgrade
    Service state:     stopped
    Enabled on:        #.#.#.#   <<< orchestrator node
  3. Copy the downloaded mub file to /image directory of orchestrator node.
  4. As root user, extract MUB file on the orchestrator node:

    # cd /image
    # tar -xf VMware-NSX-upgrade-bundle-<version>.mub
  5. This will create a new file with the same name and .tar.gz extension.
  6. Delete the folder for your current version under /repository
    For example in this example the system runs 4.1.1

    # rm -rf /repository/4.1.1.0.0.22224312

  7. Extract tar.gz to /repository

    # tar -xzf /image/VMware-NSX-upgrade-bundle-<version>.tar.gz -C /repository

  8. Set proper permissions and ownership of the /repository files by executing the following

    /opt/vmware/proton-tomcat/bin/reposync_helper.sh

  9. From the UI Resolve the REPO_SYNC on the orchestrator node: System -> Appliances -> View Details and click Resolve for REPO_SYNC and wait for this to complete.
  10. Once completed, press Resolve for each of the other 2 Managers.
  11. Clean up the downloaded mub file and extracted tar.gz file from /image:

    rm -f /image/VMware-NSX-upgrade-bundle-<version>.mub
    rm -f /image/VMware-NSX-upgrade-bundle-<version>.tar.gz
    rm -f /image/VMware-NSX-upgrade-bundle-<version>.tar.gz.sig


Option: Deploy OVA file in /repository:

  1. Download nsx-unified-appliance-<version>.ova MUB file following these instructions: Download Broadcom products and software. The downloaded version should match the version missing in the repository as identified above from the 'Identifying the issue' section.
  2. Deploy this manager as a separate appliance in vCenter and do not connect to the cluster.
  3. From this newly deployed manager, copy the /repository/<version> directory to all 3 existing managers missing the directory.
  4. As root user, run the command /opt/vmware/proton-tomcat/bin/reposync_helper.sh on all the 3 existing managers, not the newly deployed one.
  5. From the UI Resolve the REPO_SYNC on the orchestrator node: System -> Appliances -> View Details and click Resolve for REPO_SYNC and wait for this to complete.
  6. Now resolve the repo-sync failure on the other 2 nodes, from “System” -> “Appliances” page and wait for this to complete.
  7. The newly deployed manager can now be powered off and deleted once the REPO_SYNC is working. 
     

Option: Advanced LB (AVI):

  • It is possible that this same issue can be caused if NSX ALB files are missing from the repository.
    This typically occurs if at one time NSX ALB was deployed but later removed. If a user manually deletes the ALB files from the repository, for example to free disk space, then it can cause this sync failure. Logs will explicitly refer to ALB files e.g.

2024-03-19T09:41:34.557Z  INFO RepoSyncThread-1710841232019 RepoSyncFileHelper 85527 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Command to get server info for https://#.#.#.#:443/repository/21.1.2-9124/Alb_controller/ovf/controller.cert returned result CommandResultImpl [commandName=null, pid=1677285, status=SUCCESS, errorCode=0, errorMessage=null, commandOutput=HTTP/1.1 404 Not Found
2024-03-19T09:42:08.746Z  INFO RepoSyncThread-1710841232019 RepoSyncFileHelper 85527 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Command to get server info for https://#.#.#.#:443/repository/22.1.6-9191/Alb_controller/ovf/controller-disk1.vmdk returned result CommandResultImpl [commandName=null, pid=1677876, status=SUCCESS, errorCode=0, errorMessage=null, commandOutput=HTTP/1.1 404 Not Found

/var/log/proton/nsxapi.log

2024-05-29T14:32:15.898Z INFO http-nio-127.0.0.1-7440-exec-23 RepoSyncServiceImpl 117206 SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" reqId="<UUID>" subcomp="manager" username="uproton"] Starting Repository sync process, current result is RepoSyncResult [nodeId=<NODE UUID>, status=FAILED, statusMessage=, failureMessage=Unable to connect to File /repository/21.1.2-9124/Alb_controller/ovf/controller.ovf on source #.#.#.#. Please verify that file exists on source and install-upgrade service is up., errorCode=21057, percentage=0.0]

  1. Identify the NSX ALB version, in the example above it is 21.1.2
  2. Download the NSX ALB Controller ova from the VMware customer connects portal and copy it to the orchestrator node
  3. Create the directory if it does not exist

    #mkdir /repository/21.1.2-9124/Alb_controller/ovf

  4. Extract the ova files

    # tar -xvf /image/Controller.ova -C /repository/21.1.2-9124

  5. Ensure there are 4 files

     controller.ovf
     controller.mf
     controller.cert
     controller-disk1.vmdk

  6. Set proper permissions and ownership of the /repository files by executing the following:

    /opt/vmware/proton-tomcat/bin/reposync_helper.sh

  7. From the UI Resolve the REPO_SYNC on the orchestrator node: System -> Appliances -> View Details click Resolve for REPO_SYNC
  8. Once completed, repeat for each of the other 2 Managers.

Alternate option for ALB controller ova file if the customer does not intend to use ALB:

The ALB controller file check can be bypassed during Repo sync by resetting the AlbControllerVmFabricModule values to default following the below steps:

  1. Remove the Alb directory from /repository using:

        # rm -rf /repository/21.1.2-9124

  2. Get the ALB fabric ID with the bellow API:
    • GET https://<nsx-manager-ip>/api/v1/fabric/modules  >>>> note the ID present in section: "fabric_module_name" : "AlbControllerVmFabricModule",
  3. Get the ALB details with the below API call:
     
    • GET https://<nsx-manager-ip>/api/v1/fabric/modules/<alb_fabric_id>
  4. Reset the values of 'AlbControllerVmFabricModule' using the below PUT API call:

    • PUT https://<nsx-manager-ip>/api/v1/fabric/modules/<alb-fabric-id> along with adding the header "Content-Type:Application.Json"
      {
      "fabric_module_name" : "AlbControllerVmFabricModule",
          "current_version" : "1.0",
          "deployment_specs" : [ {
            "fabric_module_version" : "1.0",
            "versioned_deployment_specs" : [ {
              "host_version" : "",
              "service_vm_ovf_url" : [ "ALB_CONTROLLER_OVF" ],
              "host_type" : "ESXI"
            } ]
          } ],
          "source_authentication_mode" : "NO_AUTHENTICATION",
          "disk_provisioning" : "THIN",
          "resource_type" : "FabricModule",
          "id" : "######-####-####-####-##########",
          "display_name" : "######-####-####-####-##########",
         "_revision" : 1
      }'

 

Additional Information

If you are contacting Broadcom support about this issue, please provide the following:

 

  • The current version of NSX .
  • The version being upgraded to.
  • The state of the REPO_SYNC on all three managers

Handling Log Bundles for offline review with Broadcom support