NSX Edge disk usage high when there are high number of Load Balancers

Products

VMware NSX

Issue/Introduction

In VMware NSX 4.1.2.3 or 4.1.2.5 or 4.2.1.4, when there are above 400 Native Load Balancers setup in NSX, the NSX Edge node may report high disk usage.
As more load balancers are created, more disk space is consumed on the NSX Edge node.

Checking the disk usage on the edge node, you may see similar output as following:

root@<edge-node-1>:/var/lib/docker/overlay2# df -h | head -10
Filesystem Size Used Avail Use% Mounted on
udev 124G 0 124G 0% /dev
tmpfs 38G 96M 38G 1% /run
/dev/sda2 19G 18G 0 100% / <------------ 100% full root partition
tmpfs 188G 2.8G 186G 2% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 188G 0 188G 0% /sys/fs/cgroup
tmpfs 2.0G 0 2.0G 0% /mnt/ids
/dev/mapper/nsx-config 19G 145M 18G 1% /config
/dev/sda1 943M 7.1M 871M 1% /boot

Another view of which directory is full:

root@<edge-node-1>::/# du -xah --time --max-depth=3 /var/lib/docker/ | sort | grep G
14G 2025-06-18 11:41 /var/lib/docker/
14G 2025-06-18 11:41 /var/lib/docker/overlay2 <----------This is the directory causing / to be 100% full
4.0K 2025-05-04 02:54 /var/lib/docker/overlay2/l/<UUID>
4.0K 2025-05-04 02:54 /var/lib/docker/overlay2/l/<UUID>
4.0K 2025-05-04 02:54 /var/lib/docker/overlay2/l/<UUID>

Environment

VMware NSX 4.1.2.3

VMware NSX 4.1.2.5

VMware NSX 4.2.1.4

Cause

This is caused by an issue in NSX Load Balancer setup script.

Resolution

For NSX version other than NSX 4.1.2.3 or 4.1.2.5 or 4.2.1.4, please open a Broadcom Support Request referencing this KB.

For NSX versions 4.1.2.3 and 4.1.2.5 and 4.2.1.4, please use the below workaround.

Please keep in mind that the below workaround will cause down time and it is advised to complete the following step in a maintenance window.

Prerequisites:

Ensure you have the apply_LB_fix.sh script which is available in this KB's attachment section.
You have SSH access to both Active and Standby NSX Edge nodes.
Understand that applying this patch will cause a brief interruption to active Load Balancer services during the failover process.

Steps:

Upload the Script:
1. Upload the attached apply_LB_fix.sh script to both the Active and Standby Edge nodes.
2. Do not save the script under root directory as there may not have any free space left.
3. The script can be saved to /tmp since it uses different storage mapping.
Apply Patch on Standby Edge first:
1. Connect to the Standby Edge node via SSH.
2. Execute the apply_LB_fix.sh script:
  1. You may need to adjust permission for the file first:
    - chmod +x /tmp/apply_LB_fix.sh
  2. Run the following command
    - bash /tmp/apply_LB_fix.sh
3. This script will:
  1. Build a new, patched version of the LB container image.
  2. Trigger an HA failover (Standby becomes Active).
  3. Stop and delete all running LB service containers using the old image.
Refresh LB Containers on the Edge you ran the script:
1. In the NSX Manager UI, navigate to the now Active Edge (the same one you just ran the script on) under Fabric > Nodes.
2. Enter and then exit NSX Maintenance Mode on the Edge.
  1. This will ensure all LB service containers are recreated using the patched container image.
Verify LB Service Status:
1. Connect to the Edge node via SSH.
2. Run the command:
  1. get load-balancers status
3. Verify that all Load Balancers are in the ready state with the standby HA state. It may take a few minutes for the standby state to be reached. Example output:
```
LB-State        : ready
LR-HA-State     : standby
```
Apply Patch on the other Edge nodes in the cluster:
1. Repeat steps for all other edge nodes in the cluster, one at a time.

This script should not be applied to an Edge node unless the disk space is high with matching log entries.

Attachments

apply_LB_fix.sh get_app