Aria Operations for Networks version 6.11 appliances fail to rotate the log causing /var/log partition to grow beyond 90 Percent which results in NodeManager and FlinkContainer services not healthy and not running.
search cancel

Aria Operations for Networks version 6.11 appliances fail to rotate the log causing /var/log partition to grow beyond 90 Percent which results in NodeManager and FlinkContainer services not healthy and not running.

book

Article ID: 324415

calendar_today

Updated On:

Products

VMware Aria Operations for Networks

Issue/Introduction

Symptoms:

  1. Service NodeManager is running but not healthy.
  2. Service FlinkContainer is not running
  3. Partition /var/log is exceeding 90% storage occupation
  4. Logrotate service has issues in rotating logs for nginx, syslog, warn, etv.

Environment

VMware Aria Operations for Networks 6.11.0

Cause

After log rotation is performed it triggers a reload call to the respective service to reload the log files, so that it will continue to write to correct file.
This mechanism is broken and as a result files grow huge in size making /var/log >90%

Resolution

Aria Operations for Networks Engineering team is aware of this and a patch (6.11.0.P2.1699450092.patch.bundle) on top of Aria Operations for Networks Version 6.11  is made available which fixes the log rotation issue.

This issue is fixed in 6.12.1 release version, which can be downloaded when you click here

Below is the is available workaround which needs to be executed on all the Aria Operations for Networks affected node(s) (Platform(s) and collector(s),  prior to applying the patch or upgrading to version 6.12.

1. Take a Putty/SSH session to Aria Operations for Networks affected node(s) (Platform(s) and collector(s)

2. Login with username support

3. Execute below command to switch to ubuntu user.   

ub

4. You will need to run these commands on each node that is effected.

cd /var/log

To identify the size of the files such as warn,syslog.1 and auth.log.1 on the affected nodes

ls -lrth 

5. Look at the last 3 to 4 files, e.g. as below:

-rw-r-----   1 syslog        adm             2.5G Jul 19 16:19 warn
-rw-r-----   1 syslog        adm             4.6G Jul 19 16:19 syslog
-rw-r-----   1 syslog        adm             3.9G Jul 19 16:19 auth.log

6. Execute below command to rotate the logs manually 

sudo dd if=/dev/null of=/var/log/secure
sudo dd if=/dev/null of=/var/log/warn
sudo dd if=/dev/null of=/var/log/syslog.1
sudo dd if=/dev/null of=/var/log/auth.log.1

7. Clean up the warn,syslog.1 and auth.log.1 on the affected nodes

To do so execute below commands:

sudo rm -rf warn
sudo rm -rf syslog.1
sudo rm -rf auth.log.1

8. Delete access.log, access.1.log from /var/log/nginx on all the nodes.

To do so execute below commands:

sudo su
cd nginx/
ls --lrth
sudo rm -rf access.log
sudo rm -rf access.log.1

9. Restart the syslog and ngnix service  on all the nodes, execute below commands:

systemctl restart syslog.service
systemctl restart nginx.service

10. Post executing above mentioned steps, execute below command to validate the size of /var/log  using below command:

df -f

If there is a cluster setup then execute below command:

./run_all.sh df -h

The size of /var/log should not show less than 60%

11. Now the P2 patch can now be applied to Aria Operations for Networks 6.11 as per steps mentioned in Resolution section above. If you are upgrading to Aria operations for Networks to version 6.12.0 or 6.12.1 then you can ignore to apply the P2 patch.

12. Download Aria Operations for Networks Version 6.11 P2 patch from the attachment section in this Knowledge Base Article.


File Name: VMware-AriaOpNetworks.6.11.0.P2.1699450092.patch.bundle
File Size: 668.2 MiB
Checksums Values:

  • MD5SUM: CBE80EB278AD7A351ED84D1B0B7EC933
  • SHA1SUM: 1AD8B3352E1F6606B7E5EF7435E16A2DE23197C8
  • SHA-256: 9444734D2CB24AA5D78E107A676F30D81FFFA533CCEDA9759D817BB29F146781

Note:

If the 6.11.0-P1 patch is already applied there is no need for this work around but if P2 patch is the first patch to be applied on top of 6.11 GA version then use the following steps:

  1. Login to each Appliance VM as username support
  2. Execute the following 3 commands:
ub
sudo su
mkdir -p /usr/local/lib/python3.6/dist-packages/cli
ln -sf /usr/local/lib/python3.8/dist-packages/cli/tool_manager_runner.py /usr/local/lib/python3.6/dist-packages/cli/tool_manager_runner.py

    3. Upload patch bundle from Aria Operations for Networks GUI.

Procedure to apply patch bundle via Aria Operations for Networks GUI:

    1. Download the update patch file and save the file on your local system.
    2. Log into the Aria Operations for Networks (vRNI) GUI as an Administrator user with username admin@local
    3. Navigate to Settings > Install and Support > Overview and Updates, then under Product, select Click here
    4. Click Browse to select the locally downloaded patch file and click Upload.
      Notes:
      • When the upload is complete, Aria Operations for Networks shows the Bundle Upload Complete message notification within 2-3 minutes and the bundle processing happens in the background.
      • Until the upload of the package happens, ensure that the session is not closed. If the session ends, you have to restart the upload process.
      • Do not refresh the page after bundle upload, until you see the Update Available message notification.
    5. In the Bundle Available message notification, click View details
      Note: The Aria Operations for Networks Update screen appears. Read the Before you proceed instruction and click Continue.
    6. Wait for the pre-checks to complete, which verifies:
      • the disk space, including the space required for migration
      • the version
      • the NTP sync status
      • the bundle checksum
    7. Click Install Now.
      Note: 
      You can see the approximate time required to complete the update process on your setup.
    8. Once the update process begins, the Aria Operations for Networks Update screen provides the status of the update process.
      Notes:
      • If a node becomes inactive, the update process does not continue. The update will not resume until the node becomes active again.
      • Once the platforms are updated, you can resume your normal Aria Operations for Networks operations even though the collector update happens in parallel. Until the update process is completely over, the Node Version Mismatch detected the message is shown in the Install and Support page.
    9. Upon the completion of the update process, you a confirmation message similar to the following:

      "Validate from GUI that  all the platform and the collector nodes are updated."

Additional Information

 

 

Attachments

VMware-AriaOpNetworks.6.11.0.P2.1699450092.patch.bundle get_app