Investigating disk space usage on a virtual appliance command line
search cancel

Investigating disk space usage on a virtual appliance command line

book

Article ID: 433653

calendar_today

Updated On:

Products

VMware SDDC Manager / VCF Installer VMware vCenter Server

Issue/Introduction

Symptoms that may be seen during virtual appliance operation:

  • Virtual machine begins operating very slowly or erratically
  • Services fail, fail to restart
  • Errors in the GUI indicate low disk space
  • Unable to generate support bundles
  • Unable to upload files to the appliance
  • Logs may indicate "ENOSPC", or "No space left on device"


Normal "du", and "df" commands may not directly reveal the issue.

Environment

These issues may be seen in a virtual appliance environment where the underlying NFS or VMFS filesystem is healthy and has sufficient free space.  Once that fact is established:

  1. Login to the command line of the appliance with the appropriate root-level account permissions - usually root
  2. Run the following command to establish a simple to use alias:
alias dh="echo 'Size findings'; find -maxdepth 1 ! -name . -exec du -sh '{}' ';' | sed 's/\.\///' | sort -h"


Run the "dh" command, example output:

# dh
Size findings
du: cannot read directory './proc/2221/task/2221/net': Invalid argument
du: cannot read directory './proc/2221/net': Invalid argument
du: cannot access './proc/3685/task/426528/fdinfo/253': No such file or directory
du: cannot access './proc/3685/task/493779/fd/253': No such file or directory
du: cannot read directory './proc/1161754/task/1161754/net': Invalid argument
du: cannot read directory './proc/1161754/net': Invalid argument
du: cannot access './proc/1343421': No such file or directory
du: cannot access './proc/1379772': No such file or directory
du: cannot access './proc/1379812/task/1379812/fd/4': No such file or directory
du: cannot access './proc/1379812/task/1379812/fdinfo/4': No such file or directory
du: cannot access './proc/1379812/fd/3': No such file or directory
du: cannot access './proc/1379812/fdinfo/3': No such file or directory
0       bin
0       lib
0       lib64
0       media
0       proc
0       sbin
0       srv
0       sys
4.0K    rpms
12K     .cache
12K     mnt
16K     lost+found
24K     vasecurity
168K    home
448K    tmp
1.8M    dev
2.0M    tftpboot
9.3M    run
30M     root
39M     boot
172M    opt
230M    etc
5.6G    usr
9.8G    var
63G     storage

NOTE The "cannot access" errors are typical, and not indicative of problems.

Once you've identified a directory to investigate, use the "cd" command to change into that directory, and re-run the "dh" command.  Repeat until the source of the excessive space usage is identified.  It is important to ensure that files are not casually deleted: If you delete log files still in use, it can destabilize the system or cause unpredictable behavior, requiring a reboot to resolve.

You can check whether a file is still open for writing by a process with the "lsof" command.  Example:

# lsof messages
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
rsyslogd 297406 root  504w   REG  254,7  1981709 393235 messages

The output above indicates the "messages" file is still in use.  No output at all indicates the file is not in use.

Resolution

Archived (.gz or .tgz) log files may be needed for troubleshooting.  It is advisable to copy those off instead of directly deleting them, when possible.  If you encounter a log file that should be compressed/archived, you can perform this with the "tar" command.  Example:

# tar czvf filename.tgz filename.log

In this example, the file "filename.log" will be compressed and renamed to "filename.tgz", in the same directory.

Additional Information

For investigating space on an ESXi host, please use the following article: Investigating disk space on an ESXi host