NSX Upgrade Bundle failing for NSX Upgrade due to full Root partition and REPO_SYNC Failure
search cancel

NSX Upgrade Bundle failing for NSX Upgrade due to full Root partition and REPO_SYNC Failure

book

Article ID: 418288

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

During an NSX upgrade, the upgrade process may fail due to a combination of root partition exhaustion on NSX Manager nodes and REPO_SYNC failures between Managers.
Symptoms may include:

  • Upgrade failing to progress past Repository Sync

  • / partition on one or more Managers at 100% utilization

  • Manager services failing to start after reboot

  • Upgrade failing with REPO SYNC errors, such as:

     
    Unable to connect to File/repository/<version>/Manager/vmware-mount/libgobject-2.0.so.0 on source manager <node>. Please verify file exists on the source and install-upgrade service is up.
  • Unable to resolve REPO_SYNC from the NSX UI

  • Large or duplicated files found in / directory

  • There are alarms on 1 or more NSX Managers showing alarms for 100% disk usage for / partition.

  • Running df -h from root of manager node we confirm 100% usage as shown below:

These conditions prevent the NSX upgrade coordinator from validating and distributing the upgrade bundle to all Manager nodes.

Environment

VMware NSX 4.1.x

Cause

Manager nodes had their / partition at 100% usage, caused by large, unnecessary files stored in incorrect locations, including:

  • Duplicated HostComponents subdirectories

  • Misplaced upgrade bundle content under / directory

  • Large .vmdk files copied accidentally into / or non-repository paths

This prevents NSX Manager services—including install-upgrade—from operating normally.

Resolution

Workaround:

Step 1:

1.  Run the following commands on affected Managers:

 
find / -xdev -type f -size +100M
find / -xdev -type f -size +1000M
Example output:

 

2. Delete unneeded large files that are not part of the active NSX installation (e.g., incorrectly placed .vmdk files or copied HostComponents directories).

rm -rf <filename_or_directory>

3. Verify the / partition usage decreased and Manager services recover.

#  df -h

>  get cluster status

Step 2: (If REPO_SYNC is in a FAILED state on any nodes):

Follow KB below for all manager nodes.

After replacing Managers or while running Upgrade prechecks, Repo_Sync is Failed – Workaround 1

Step 3:

Retry the Upgrade, and it should succeed to the Pre check section.