Symptoms that may be seen during virtual appliance operation:
Normal "du", and "df" commands may not directly reveal the issue.
These issues may be seen in a virtual appliance environment where the underlying NFS or VMFS filesystem is healthy and has sufficient free space. Once that fact is established:
alias dh="echo 'Size findings'; find -maxdepth 1 ! -name . -exec du -sh '{}' ';' | sed 's/\.\///' | sort -h"
Run the "dh" command, example output:
# dh
Size findings
du: cannot read directory './proc/2221/task/2221/net': Invalid argument
du: cannot read directory './proc/2221/net': Invalid argument
du: cannot access './proc/3685/task/426528/fdinfo/253': No such file or directory
du: cannot access './proc/3685/task/493779/fd/253': No such file or directory
du: cannot read directory './proc/1161754/task/1161754/net': Invalid argument
du: cannot read directory './proc/1161754/net': Invalid argument
du: cannot access './proc/1343421': No such file or directory
du: cannot access './proc/1379772': No such file or directory
du: cannot access './proc/1379812/task/1379812/fd/4': No such file or directory
du: cannot access './proc/1379812/task/1379812/fdinfo/4': No such file or directory
du: cannot access './proc/1379812/fd/3': No such file or directory
du: cannot access './proc/1379812/fdinfo/3': No such file or directory
0 bin
0 lib
0 lib64
0 media
0 proc
0 sbin
0 srv
0 sys
4.0K rpms
12K .cache
12K mnt
16K lost+found
24K vasecurity
168K home
448K tmp
1.8M dev
2.0M tftpboot
9.3M run
30M root
39M boot
172M opt
230M etc
5.6G usr
9.8G var
63G storage
NOTE The "cannot access" errors are typical, and not indicative of problems.
Once you've identified a directory to investigate, use the "cd" command to change into that directory, and re-run the "dh" command. Repeat until the source of the excessive space usage is identified. It is important to ensure that files are not casually deleted: If you delete log files still in use, it can destabilize the system or cause unpredictable behavior, requiring a reboot to resolve.
You can check whether a file is still open for writing by a process with the "lsof" command. Example:
# lsof messagesCOMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAMErsyslogd 297406 root 504w REG 254,7 1981709 393235 messages
The output above indicates the "messages" file is still in use. No output at all indicates the file is not in use.
Archived (.gz or .tgz) log files may be needed for troubleshooting. It is advisable to copy those off instead of directly deleting them, when possible. If you encounter a log file that should be compressed/archived, you can perform this with the "tar" command. Example:
# tar czvf filename.tgz filename.log
In this example, the file "filename.log" will be compressed and renamed to "filename.tgz", in the same directory.
For investigating space on an ESXi host, please use the following article: Investigating disk space on an ESXi host