VMware Cloud Foundation SDDC-Manager nfs-mount 100% Full
search cancel

VMware Cloud Foundation SDDC-Manager nfs-mount 100% Full

book

Article ID: 318268

calendar_today

Updated On:

Products

VMware Cloud Foundation VMware NSX

Issue/Introduction

In this article we discuss how to resolve the issue of SDDC-Manager nfs-mount 100% full

  • The nfs repo on SDDC manager is either full or getting close to it. When you run prechecks you will see errors about it being less than required (OR)
  • ​​​LCM-Bundle-repo, datastore usage on disk alert: only 25GB free out of 502GB.  Need help in expanding the NFS share in SDDC manager to resolve the errors
  • the nfs repo can be full because of previous scheduled backups as well 
  • You can cross verify by running du - shc * and clear the old backups 

Environment

VMware Cloud Foundation 3.9.x
SDDC Manager 5.x
VMware NSX-T

Cause

  • The existing backup cleanup script does not work with NSX-T. This should be resolved in a future release.
  • As per the retention policy properties, cron job should delete the nsx backup files more than 2 weeks also lcm backup files should not exceed more than 5 files.
  • This can be checked by looking at the following file:
    • root@sddcmgr-1 [ /nfs/vmware/vcf/nfs-mount/backup/scripts ]# cat nsx_backup_config.properties
      # All properties are mandatory.
      # Property BACKUP.RETENTION.HOURS is used for retention policy. All backup files past # 'N' hours will be retained and rest will be deleted in current day.
      # Constraints : Min 2
      BACKUP.RETENTION.HOURS=6
      
      # Property BACKUP.RETENTION.DAYS is used for retention policy. A single latest backup file # for a day for 'N' number of days will be retained and rest will be deleted in that day.
      # Example, if there are hourly backups configured and say there are backups for each hour, # files will be evo-nsx-******-23_0-****.backupproperties,
      # evo-nsx-******-22_0-****.backupproperties, evo-nsx-******-21_0-****.backupproperties,....
      # In above file list latest would be at 23:00, so file evo-nsx-******-23_0-**** # will be retained.
      # Constraints : Min 1
      BACKUP.RETENTION.DAYS=7
      
      # Property BACKUP.RETENTION.WEEKS is used for retention policy. A single latest backup # file for a week for 'N' number of weeks will be retained and rest will be deleted in # that week.
      # Example, if there are hourly backups configured and say there are backups for each hour, # files on the last day of the week ie on Sunday will be evo-nsx-******-23_0-Sun***.backupproperties,
      # evo-nsx-******-22_0-Sun***.backupproperties, evo-nsx-******-21_0-Sun***.backupproperties,....
      # In above file list latest would be at 23:00, so file evo-nsx-******-23_0-Sun*** # If there is no file available on Sunday then previous day will be checked until # start day of the week.
      BACKUP.RETENTION.WEEKS=2
      
      # Enable log LEVEL
      # Allowed levels INFO and DEBUG
      BACKUP.LOGGER.LEVEL=INF
  • In this case the backup is taking most of the space in the directory.

So, the cronjob for cleaning the nsxbackup files is failing due to an issue with the script nsxbackupcleaner.py and fills the space of NFS share and this will need cleaned up. 

Resolution

This is a condition that may occur in a VMware NSX environment.


Workaround:
  • The script attached to this KB works around the issue.
  • Manually run the script with the below command to clear out the existing old backups.

    python nsx_backup_cleaner.py -d /nfs/vmware/vcf/nfs-mount/backup/ -k 5 -l 5

  • To create a cronjob for the task do the following.
    1. crontab -e
    2. Edited config with ‘i’.
    3. * * * * * /usr/bin/python3 /nfs/vmware/vcf/nfs-mount/backup/scripts/nsx_backup_cleaner.py -d /nfs/vmware/vcf/nfs-mount/backup/ -k 5 -l 5
    4. Saved with ‘wq’.
  • Both the script and the cronjob should persist after patches and upgrades.

 

Note: Once the old backups have been cleared out, best practice is to configure backups to use external storage (e.g., FTP).

Additional Information

Impact/Risks:

  • When making any change you must take a full backup of the system you are running on in case of failure.
  • Pre-checks can have issues.
  • Data may not be displayed or SDDC is not responding

 

To disable/remove cronjob do the following.

  • Enter crontab as root with 'crontab -e', then enter insert mode with 'i'.

  • Locate and comment out or delete the line for the job that is no longer needed.

Attachments

nsx_backup_cleaner.py get_app