High disk usage for /var/log on NSX Edge node

Products

VMware NSX

Issue/Introduction

Disk usage on partition /var/log is high on NSX Edge node.
Inspecting the disk usage for this partition from root CLI, the lb subfolder is the highest usage: du -hd1 /var/log | sort -hr
In turn, a lof of tarball files are present: ls /var/log/lb/lbconf-repo/*/lbconf-repo-*.tar.gz

There are several subfolders under /var/log/lb, one for each Load Balancer instance. Each have a set of log files: lbconf_gen.log
Log lines similar to the below are encountered in /var/log/lb/*/lbconf_gen.log

WARNING lbconf-repo commit failed: Command 'ls -ltr /var/log/lb/lbconf-repo/lbconf-repo-*.tar.gz' returned non-zero exit status 2.
ERROR Traceback (most recent call last):
  File "/opt/vmware/nsx-edge/bin/lbconf_gen.py", line 3307, in commit_lbconf
    self.do_rotate()
  File "/opt/vmware/nsx-edge/bin/lbconf_gen.py", line 3246, in do_rotate
    output = subprocess.check_output('ls -ltr {}/lbconf-repo-*.tar.gz'.format(self.repo_root_path), shell=True)
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'ls -ltr /var/log/lb/lbconf-repo/lbconf-repo-*.tar.gz' returned non-zero exit status 2.

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Cause

In NSX 4.1.0, a debugging feature was introduced to help diagnose Load Balancer issues. This feature generates debugging data when a Load Balancer configuration is updated and stores it under /var/log/lb/lbconf-repo/
The script is designed to rotate the debugging data based on a defined retention policy. An issue in the rotation prevents the script from deleting older bundles. This results in growing disk usage.
If the Load Balancer configuration is frequently updated, the disk usage grows faster.

Resolution

This is a known issue impacting VMware NSX.

To work around this issue, as this feature is only used for debugging, it can be disabled to prevent it from creating additional debugging data:

Edit the following configuration file on the NSX Edge, from root CLI: vi /var/log/lb/lbconf-repo/lbconf-config
Change the first column from 1 to 0:
Before:
```
1 102400 1024000 7
```
After:
```
0 102400 1024000 7
```
To reduce the disk usage, delete some of the oldest bundles.
To list by time (from most recent to oldest): ls -lt /var/log/lb/lbconf-repo/*/lbconf-repo-*.tar.gz

Additional Information

If this article did not help resolve your issue, you can review the following article for further reference: Troubleshooting disk space related issues on NSX Nodes