.backup log file extension are seen in NSX appliances

Products

VMware NSX

Issue/Introduction

In the NSX manager and edge appliances, under /var/log/ we can see multiple files with the extension '.backup':

nsx-audit.log-##########.backup
li-syslog-##########.backup
syslog.1.gz-##########.backup
firewallpkt.log.1.gz-##########.backup

In some cases, you may also see Logrotate Failed logs in the NSX appliance /var/log/syslog:

nsx-manager-01 systemd 1 - - Starting Rotate log files...
nsx-manager-01 logrotate 1647140 - - error: state file /var/lib/logrotate/status is already locked
nsx-manager-01 logrotate 1647140 - - logrotate does not support parallel execution on the same set of logfiles.
nsx-manager-01 systemd 1 - - logrotate.service: Main process exited, code=exited, status=3/NOTIMPLEMENTED
nsx-manager-01 systemd 1 - - logrotate.service: Failed with result 'exit-code'.
nsx-manager-01 systemd 1 - - Failed to start Rotate log files.

Note: The files listed above are just examples of files known to have a .backup extension added, it can happen to any log file rotated by the logrotate process (/etc/logrotate.conf).

Cause

A race condition can occur when two separate instances of logrotate run concurrently, creating duplicate files, which leads to one of the processes creating the .backup extension or one of them failing completely. This can occur in NSX due to two sperate cronjobs for logrotate which can overlap, one runs per minute, the other runs daily.

Resolution

This issue is resolved in VMware NSX 4.2.2, available at Broadcom downloads.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

Workaround:

The .backup file can be removed, if they are left in place, over time, the disk usage may increase, as there is no process to rotate or archive these .backup files.

If you do start encountering space issues, remove the .backup files, if you wish to preserve these .backup files before removal, export a support bundle, these log files will be included in the support bundle.

To remove the files, log in as root user on the appliance and for each file you wish to remove, in /var/log/, run:

rm -f <filename>.backup

If the upgrade is not feasible/possible, follow the steps below to prevent occurrence of this issue:

Comment out the logrotate from the daily cronjob, log into each manager/edge node as root user and run the following command's:

cp /etc/cron.daily/logrotate /etc/cron.daily/logrotate.bak -> backup the file before editing

sed -i -e 's:^/usr/sbin/logrotate:#/usr/sbin/logrotate:' /etc/cron.daily/logrotate -> this comments out the logrotate option in the cronjob

Before the edit the file looks like:

...

# this cronjob persists removals (but not purges)
if [ ! -x /usr/sbin/logrotate ]; then
exit 0
fi

/usr/sbin/logrotate /etc/logrotate.conf
EXITVALUE=$?
if [ $EXITVALUE != 0 ]; then
/usr/bin/logger -t logrotate "ALERT exited abnormally with [$EXITVALUE]"

...

After the edit, the file looks like:

...

# this cronjob persists removals (but not purges)
if [ ! -x /usr/sbin/logrotate ]; then
exit 0
fi

#/usr/sbin/logrotate /etc/logrotate.conf ------->>>>>>> Note the hash added to comment out the logrotate option
EXITVALUE=$?
if [ $EXITVALUE != 0 ]; then
/usr/bin/logger -t logrotate "ALERT exited abnormally with [$EXITVALUE]"

...

Note: If the edge or manager is replaced, the default daily cronjob will be restored and the above workaround will need to be applied again.

Additional Information

If this KB did not help resolve your issue, you can review the following KB for further troubleshooting steps:

Manager disk usage high alarm

If you are contacting Broadcom support about this issue, please provide the following:

NSX Manager or edge support bundles, which ever one you have space issues with.
Text of any error messages seen in NSX GUI or command lines relevant to the investigation

Handling Log Bundles for offline review with Broadcom support