NSX-v Controller syslog file rotation process fails resulting in partition filling up
search cancel

NSX-v Controller syslog file rotation process fails resulting in partition filling up

book

Article ID: 339174

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
  • NSX Controllers show high CPU Utilization.
  • CPU Utilization Graphs on the vCenter for the Controller Appliance show a linear increase up to the current Value.
  • vRealize Network Insight might be used in the environment.
  • Controller CLI command ‘show status’ shows /var/log 100%
  • Issue is seen in NSX for vSphere 6.3.6 and 6.4.1


Environment

VMware NSX for vSphere 6.3.x
VMware NSX for vSphere 6.4.x

Cause

vRNI 3.7 and below utilizes a CLI method to pull relevant Network Information from the NSX Controllers; this results in it logging in to the NSX Controller, running and parsing a CLI command, and logs off again. Also as part of normal controller operation the auth.log and syslog files are continuously updated with log messages from different actions happen in the controller. This all causes the controller log files in the /var/log directory keep growing filling up the controller partition. 

Once the log file after reaches 32MB of size, new log file will be created with extension .1 and any new messages are added to auth.log and syslog files. As a result of this issue, the
 logrotate process is unable to write into auth.log file and syslog files which causes the rotated auth.log.1 and syslog.1 to grow into very large size filling the partition. 

Note: vRNI is just an example for a software triggering this issue; other VMware or third party software, or simple scripts used to log in to the NSX Controllers would trigger space issue.


Resolution

This issue is resolved in VMware NSX for vSphere 6.4.5 and above 

Workaround:
To work around this issue if you are not able to upgrade:
 

1.  Disable the vRNI Controller polling in vRNI (detailed below) or any other Software / Scripts that uses a CLI method to pull relevant Network Information from the NSX Controllers, this should prevent the creation of a large volume of log files.
 

Polling Information from the NSX Controllers can be disabled on vRNI with the following steps:

    1. Navigate to Accounts and Data Sources page on Settings.
    2. Click on the Edit data source icon on right side of the NSX Manager.
    3. Unselect the Enable NSX Controller.
    4. Click on Submit button

2. Login in to root mode of the controller .To switch to the root user on any controller node, we first need the root password for the specific controller. Please follow below steps to get root access to controller 

            Root Login steps for NSX-V Controller Nodes:

    • Login in to root mode of NSX Manager using KB2149630.

    • Look for the controller id in the Networking & Security Tab in the vSphere (Web) client under the controller deployment section (Networking & Security > Installation & Upgrade > Management > NSX Controller Nodes).

    • Execute the following command in the Linux shell of the NSX Manager:

/home/secureall/secureall/sem/WEB-INF/classes/GetNvpApiPassword.sh controller-NN
 
Note: Replace controller-NN with the correct controller id. For example: controller-12)In the last line you will find the root password for this controller node:
Now, login as “admin” via SSH on the controller
    • Type the following command:: debug os-shell
    • Enter the root password which was displayed on the NSX Manager shell.Now you are in root mode of controller

 

3. Delete or transfer the auth.log.* and syslog.* files from /var/log/ directory periodically. Controller reboot or redeployment is not required.

Backup the rsyslog file to /tmp

    • cp /etc/logrotate.d/rsyslog /tmp/rsylog.ORG

Edit the rsyslog file

    • vi /etc/logrotate.d/rsyslog
    • replace following line. (it appears twice in the file )

usr/bin/systemctl reload syslog.service > /dev/null_

with

usr/bin/systemctl kill -s HUP rsyslog.service > /dev/null_

    • save the file with ‘:wq’

 

Example file after edits:

cat rsyslog/var/log/syslog
{  
            rotate 56  
            size 32M  
            create  
            missingok  
            notifempty  
            delaycompress  
            compress  
            postrotate  
                    /usr/bin/systemctl kill -s HUP rsyslog.service > /dev/null  
            endscript  
    }  
      
    /var/log/mail.info  
    /var/log/mail.warn  
    /var/log/mail.err  
    /var/log/mail.log  
    /var/log/daemon.log  
    /var/log/kern.log  
    /var/log/auth.log  
    /var/log/user.log  
    /var/log/lpr.log  
    /var/log/cron.log  
    /var/log/debug  
    /var/log/messages  
    {  
            rotate 4  
            weekly  
            missingok  
            notifempty  
            compress  
            delaycompress  
            sharedscripts  
            postrotate  
                    /usr/bin/systemctl kill -s HUP rsyslog.service > /dev/null  
            endscript  
    }  

Execute logrotate to read new config

    • logrotate /etc/logrotate.d/rsyslog

 

Restart log service

    • service rsyslog restart
    • service rsyslog status 
 rsyslog.service - System Logging Service  
       Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled)  
       Active: active (running) since Fri 2018-10-05 21:16:17 UTC; 2s ago  
         Docs: man:rsyslogd(8)  
               http://www.rsyslog.com/doc/  
     Main PID: 11057 (rsyslogd)  
        Tasks: 6  
       CGroup: /system.slice/rsyslog.service  
               └─11057 /usr/sbin/rsyslogd -n  
    Oct 05 21:16:17 nsx-controller systemd[1]: Starting System Logging Service...  
    Oct 05 21:16:17 nsx-controller systemd[1]: Started System Logging Service.    

 

Once the service restarts check /var/log to see the file syslog start writing logs into it.  

Change to the /var/log directory and remove or transfer the identified large files consuming all the space  

    • cd /var/log  
    • rm syslog.*  
    • rm auth.log.*

 

Note: The Workaround described above needs to performed on all controllers and is persisted across reboots. Also, when a new controller is redeployed, the same changes will need to be performed on the newly deployed controller(s).