NSX Manager /var/log partition reaches 100% disk usage due to uncompressed rolled logs
search cancel

NSX Manager /var/log partition reaches 100% disk usage due to uncompressed rolled logs

book

Article ID: 430499

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • /var/log on one or more NSX Manager nodes reaches very high.

Filesystem                   Size  Used Avail Use% Mounted on
tmpfs                        9.5G  1.3M  9.5G   1% /run
/dev/sda3                     11G  4.2G  5.6G  43% /
tmpfs                         48G  4.5M   48G   1% /dev/shm
tmpfs                        5.0M     0  5.0M   0% /run/lock
/dev/mapper/nsx-repository    31G   12G   19G  39% /repository
/dev/mapper/nsx-tmp          9.6G  169M  9.0G   2% /tmp
/dev/mapper/nsx-secondary     98G  5.8G   88G   7% /nonconfig
/dev/mapper/nsx-var+dump      20G   24K   19G   1% /var/dump
/dev/mapper/nsx-var+log       37G   32G  2.9G  92% /var/log 
/dev/sda1                    942M  7.2M  870M   1% /boot
/dev/mapper/nsx-config__bak   29G  3.0G   25G  11% /config_bak
/dev/mapper/nsx-config        29G  1.4G   27G   6% /config
/dev/mapper/nsx-image         62G   20G   40G  33% /image
tmpfs                        9.5G  8.0K  9.5G   1% /run/user/1007
tmpfs                        9.5G  8.0K  9.5G   1% /run/user/0

  • Verification of the /var/log partition reveals multiple log files in a uncompressed state.

Example: /var/log/proton# ls -lrt nsxapi*

-rw-r----- 1 uproton uproton 262145339 Feb  8 06:49 nsxapi.60.log
-rw-r----- 1 uproton uproton 262151464 Feb  8 07:04 nsxapi.59.log
-rw-r----- 1 uproton uproton 262144771 Feb  8 07:24 nsxapi.58.log
-rw-r----- 1 uproton uproton 262144117 Feb  8 07:38 nsxapi.57.log
-rw-r----- 1 uproton uproton 262144176 Feb  8 08:21 nsxapi.56.log
-rw-r----- 1 uproton uproton 262144614 Feb  8 08:38 nsxapi.55.log
-rw-r----- 1 uproton uproton 262144068 Feb  8 09:32 nsxapi.54.log
-rw-r----- 1 uproton uproton 262144355 Feb  8 09:48 nsxapi.53.log
-rw-r----- 1 uproton uproton 262144273 Feb  8 10:09 nsxapi.52.log
-rw-r----- 1 uproton uproton 262144149 Feb  8 10:18 nsxapi.51.log
-rw-r----- 1 uproton uproton 262144435 Feb  8 10:45 nsxapi.50.log
-rw-r----- 1 uproton uproton 262145156 Feb  8 11:32 nsxapi.49.log
-rw-r----- 1 uproton uproton 262144092 Feb  8 11:46 nsxapi.48.log
-rw-r----- 1 uproton uproton 262144137 Feb  8 11:57 nsxapi.47.log
-rw-r----- 1 uproton uproton 262144072 Feb  8 12:04 nsxapi.46.log
-rw-r----- 1 uproton uproton 262144142 Feb  8 12:09 nsxapi.45.log
-rw-r----- 1 uproton uproton 262144835 Feb  8 12:18 nsxapi.44.log
-rw-r----- 1 uproton uproton 262144298 Feb  8 12:33 nsxapi.43.log

  • NSX Manager reports manager_health.manager_disk_usage_high or manager_health.manager_disk_usage_very_high indicating the log partition has exceeded capacity thresholds.

CRITICAL NSX 3133 [nsx@4413 comp="nsx-manager" subcomp="node-mgmt" username="root" level="CRITICAL" eventFeatureName="manager_health" eventType="manager_disk_usage_very_high" eventSev="critical" eventState="On" entId="########" logger="nsx_monitoring.clientlibrary.event_source"] At the time this alarm was raised, the disk usage for the Manager node disk partition /var/log reached 90% which is at or above the very high threshold value of 90%.

 

Environment

VMware NSX 

Cause

This issue is caused by the manual uncompression of rolled NSX log files (e.g., .gz archives) directly within the /var/log directory of the NSX Manager.

When these files are manually unzipped, the resulting '.log' files are no longer recognized by the automated rotation and re-compression routines. Consequently, these uncompressed files remain on the file system indefinitely, growing in size until the /var/log partition reaches capacity, which may lead to management plane instability. This manual modification of the log structure is considered an unsupported administrative action.

Resolution

To fix this issue in the live setup, please follow the below steps:

  1. Locate the log repository containing the uncompressed files. For example, if the Proton service logs were uncompressed, navigate to /var/log/proton/.
  2. Identify files following the pattern <log_name>.<number>.log.
    Example: nsxapi.1.log, nsxapi.2.log, through nsxapi.20.log.
  3. Perform a backup of these files to an external location if needed for audit or troubleshooting. Once backed up, delete the uncompressed numbered logs.
  4. Confirm the directory is clean. Using the Proton example, only the active nsxapi.log and any valid .gz archives should remain in /var/log/proton/. Ensure no other nsxapi.*.log files exist.

Caution: Do not delete the active log file currently being written to (e.g., nsxapi.log). Only remove the uncompressed historical logs.

Prevention: Do not unzip any log files on the manager under /var/log/. If analysis is needed, copy the log files off the node and unzip them elsewhere.
Monitoring: Use existing alarms manager_health.manager_disk_usage_high and manager_health.manager_disk_usage_very_high for /var/log.