One or all of the controller nodes disk that is part of an NSX Cluster gets full and gets into a disconnected state.
This issue is resolved in VMware NSX Data Center for vSphere 6.4.2 and 6.3.7.
Workaround:
1. Edit the /etc/logrotate.d/rsyslog file:
/var/log/syslog
{
rotate 5
size 100M
missingok
notifempty
compress
postrotate
reload rsyslog >/dev/null 2>&1 || true <--- REPLACE RELOAD WITH RESTART
endscript
}
/var/log/mail.info
/var/log/mail.warn
/var/log/mail.err
/var/log/mail.log
/var/log/daemon.log
/var/log/kern.log
/var/log/auth.log
/var/log/user.log
/var/log/lpr.log
/var/log/cron.log
/var/log/debug
/var/log/messages
{
rotate 4
size weekly
missingok
notifempty
compress
sharedscripts
postrotate
reload rsyslog >/dev/null 2>&1 || true <--- REPLACE RELOAD WITH RESTART
endscript
}
-------
2. Run this command to identify the biggest file under /var/log:
$ find /var/log -type f -exec ls -l {} \;|sort -k 5n|awk '{size=$5;var[1024**3]="Gb"; var[1024**2]="Mb";var[1024]="Kb"; for (x=1024**3; x>=1024; x/=1024) {if (size >=x){printf "%6.2f %s\t%s\n", size/x,var[x],$9;break}}}'
3. Then delete that file:
$ rm /var/log/filename
4. If df -h still shows 100% of /var/log, then compare the disk usage with du -h /var/log and if there is a difference, that means the rsyslog process still holds the deleted file.
5. Get the PID of the rsyslog and the command line to start the process:
$ ps -aux | grep rsyslog
6. Then get all the files open by the rsyslog process to verify it still holds the deleted file:
$ ls -l /proc/<PID>/fd | grep deleted
or
$ /usr/sbin/lsof | grep deleted
7. Kill the process:
$ kill -9 PID
8. Start the process again by running the command line from the output of step 5.:
$ /usr/sbin/rsyslogd -n