Stale application droplets and packages causing Full Disk usage in NFS Server VM

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

This KB article addresses the issue of the disk usage in the NFS Server VMs filling up to near or at maximum capacity.

For example, In the image below, we notice that the disk usage in the /var/vcap/store directory on the NFS Server VM is at maximum capacity (100%).

A symptom of this issue would be running the df -h command on the nfs_server VM, and noticing that the disk space in the /var/vcap/store directory is anywhere from 80% to 100%.

Cause

Stale droplets and stale application packages can contribute to a build-up of disk space on the NFS Server VM.

We can also SSH into the nfs_server VM, and run the du -h -d 1 /var/vcap/store/shared | sort -rh command to find out what subdirectory in /var/vcap/store is taking up the most space:

1. SSH into nfs_server VM:

bosh -d $(bosh ds --column=name | grep ^cf-) ssh nfs_server

2. Become the root user:

sudo su -

3. Run the du command:

du -h -d 1 /var/vcap/store/shared | sort -rh

If the output of the above du command shows that most of the space is being taken up by /var/vcap/store/shared/cc-droplets and /var/vcap/store/shared/cc-packages like in the image below, the issue might be attributed to the presence of stale droplets and application packages which need to be cleared out manually.

Resolution

We have 2 separate workarounds that we can choose from to resolve this issue which are:

WORKAROUND 1: Scale up the NFS Server VM Persistent Disk Type via the TAS tile > Resource Config tab in the TAS tile.

WORKAROUND 2: Manually clear out stale droplets and app packages (droplets and app packages that are no longer being used) with a clean-up script (expire.rb). See the steps below on how to use the expire.rb script to clean these stale artifacts:

1. Per the documentation (https://docs.vmware.com/en/VMware-Tanzu-Application-Service/4.0/tas-for-vms/configure-pas.html#configure-file-storage-16), set the maximum droplets and packages per application to be both 1 as seen in the image below. We can find this setting by clicking TAS tile > File Storage, and change the Maximum valid packages per app and Maximum staged droplets per app to 1 and click the save button.

NOTE: After clicking the save button, we would need to run an Apply Changes only on the TAS tile

2. After the apply changes has successfully completed, we can SSH into any Cloud Controller VM:

bosh -d $(bosh ds --column=name | grep ^cf-) ssh cloud_controller/0

3. Become the root user:

sudo su -

4. Switch to the /tmp directory:

cd /tmp

5. Create the expire.rb script file, which will contain the contents of our Ruby script:

touch /tmp/expire.rb

6. Edit the /tmp/expire.rb file, and copy the contents below into vim editor and save the file:

Edit /tmp/expire.rb with vim:

vim /tmp/expire.rb

Paste the script below into the vim editor and save the file with the filename of expire.rb:

puts "starting expiring droplets/packages script...."

def self.output_bits_info
  current_droplets = DropletModel.where(state: DropletModel::STAGED_STATE).count
  current_packages = PackageModel.where(state: PackageModel::READY_STATE).count

  puts "Number of droplets: #{current_droplets}"
  puts "Number of packages: #{current_packages}"
end

puts "State before"
output_bits_info

AppModel.all.each do |a|
  expirer = BitsExpiration.new
  expirer.expire_droplets!(a)
  expirer.expire_packages!(a)
end

puts "State after"
output_bits_info

7. Run the expire.rb script:

cat expire.rb | /var/vcap/jobs/cloud_controller_ng/bin/console

8. When running the script, we may get an output that looks like the image below. Notice the total number of droplets and packages printed out. In this case, we have a total droplets and packages count of 8,312 and 11,792. These totals are BEFORE the cleanup is actually run. If the issue is truly due to stale droplets and packages taking up disk space, the number of droplets and packages should decrease after the script is finished running.

9. After the clean-up portion of the script runs and finishes, we may get output similar to whats in the image below. If you get a similar output, press the 'q' key to exit out of the script output:

10. After pressing the 'q' key, we get this output seen in the image below. We notice that the total number of droplets and app packages has decreased down to 7,074 and 7,904. From step 7, we recall that the total number of droplets and app packages were 8,312 and 11,792. Because the number of droplets and packages has decreased, it is likely that we had a fair number of stale droplets and app packages.

11. We can now check to see if the NFS Server VM disk usage has decreased as well. To do this, we can SSH into the NFS Server VM, and run the command df -h to check disk space usage:

SSH into nfs_server VM:

bosh -d $(bosh ds --column=name | grep ^cf-) ssh nfs_server

Become the root user:

sudo su -

Run the df -h command:

df -h

Checking the disk usage, we see that it has decreased significantly: