NSX Manager disk usage high alarm and partition /tmp shows above 90%
search cancel

NSX Manager disk usage high alarm and partition /tmp shows above 90%

book

Article ID: 393334

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • You may see NSX alarm similar to the below:

    Manager Health         Manager Disk Usage High        Medium        <TimeStamp>
    -------------------------------------------------------------------------------
    Description            The disk usage for the Manager node disk partition /tmp has reached 90% which is at or above the high threshold value of 90%
    Recommended Action     Examine the partition with high usage and see if there are any unexpected large files that can be removed.
  • df -h output shows /tmp partition usage above 90% but you may not find any files that consuming large space.

    df -h
    
    Filesystem                    Size    Used    Avail    Use%    Mounted on
    tmpfs                         2.4G    1.4M    2.4G     1%    /run
    /dev/sda3                      11G    4.8G    4.9G    50%    /
    tmpfs                          12G    3.8M     12G     1%    /dev/shm
    tmpfs                         5.0M       0    5.0M     0%    /run/lock
    /dev/sda1                     942M    7.1M    870M     1%    /boot
    /dev/mapper/nsx-config__bak    29G     54M     28G     1%    /config_bak
    /dev/mapper/nsx-config         29G     51M     28G     1%    /config
    /dev/mapper/nsx-secondary      98G    614M     93G     1%    /nonconfig
    /dev/mapper/nsx-image          42G    590M     40G     2%    /image
    /dev/mapper/nsx-repository     31G    8.9G     21G    31%    /repository
    /dev/mapper/nsxvar+dump       9.3G     24K    8.8G     1%    /var/dump
    /dev/mapper/nsx-tmp           3.7G    3.3G    225M    94%    /tmp
    /dev/mapper/nsx-var+log        27G    9.2G     17G    37%    /var/log
    tmpfs                         2.4G    4.0K    2.4G     1%    /run/user/1007
    tmpfs                         2.4G    4.0K    2.4G     1%    /run/user/0
    
    du -hsx /tmp/* | sort -rh | head -15 
    
    68K /tmp/hsperfdata_nsx-replicator 
    68K /tmp/hsperfdata_corfu 
    36K /tmp/hsperfdata_uuc 
    36K /tmp/hsperfdata_uproxy 
    36K /tmp/hsperfdata_uproton 
    36K /tmp/hsperfdata_uphc 
    36K /tmp/hsperfdata_ucminv 
    36K /tmp/hsperfdata_nsx-search 
    36K /tmp/hsperfdata_nsx-messaging 
    36K /tmp/hsperfdata_nsx-idps 
    36K /tmp/hsperfdata_nsx-cbm 
    36K /tmp/hsperfdata_nsx 
    8.0K /tmp/systemd-private-29c4de################ac9ab222b1-systemd-timedated.service-Bquhkg 
    8.0K /tmp/systemd-private-29c4de################ac9ab222b1-systemd-resolved.service-bQlOqM 
    8.0K /tmp/systemd-private-29c4de################ac9ab222b1-systemd-logind.service-krDktF
  • You see that there are large files that have been deleted but are still opened under /tmp.

    lsof +L1 /tmp
    
    COMMAND       PID           USER   FD   TYPE DEVICE SIZE/OFF NLINK    NODE NAME
    java         3568            uuc   59w   REG  252,6  8388665     0 1253385 /var/log/upgrade-coordinator/upgrade-coordinator.1.log (deleted)
    java         3568            uuc   60w   REG  252,6 10485836     0 1253397 /var/log/upgrade-coordinator/corfu-metrics.1.log (deleted)
    java         4095       nsx-idps  mem-W  REG  252,2    32768     1      43 /tmp/hsperfdata_nsx-idps/4095
    java         4105           uphc  mem-W  REG  252,2    32768     1      44 /tmp/hsperfdata_uphc/4105
    java         4105           uphc   59w   REG  252,6 11534563     0 1114175 /var/log/phonehome-coordinator/phonehome-coordinator.1.log (deleted)
    java         4105           uphc   61w   REG  252,6 10485903     0 1114184 /var/log/phonehome-coordinator/corfu-metrics.1.log (deleted)
    java         4105           uphc   62u   REG  252,6  1048642     0 1114127 /var/log/phonehome-coordinator/spring.1.log (deleted)
    java         4174          corfu  mem-W  REG  252,2    32768     1      41 /tmp/hsperfdata_corfu/4174
    java         4944         ucminv  mem-W  REG  252,2    32768     1      49 /tmp/hsperfdata_ucminv/4944
    java         4944         ucminv   59u   REG  252,6 10485938     0  835659 /var/log/search/search-inventory.1.log (deleted)
    java         4944         ucminv   61w   REG  252,6 31457318     0 1531988 /var/log/cm-inventory/corfu-metrics.1.log (deleted)
    java         4944         ucminv   62w   REG  252,6  2097207     0 1532120 /var/log/cm-inventory/nsx-mp-metrics.1.log (deleted)
    java         4944         ucminv   63w   REG  252,6 92280059     0 1532001 /var/log/cm-inventory/cm-inventory.1.log (deleted)

 

Environment

  • VMware NSX 4.x
  • VMware NSX-T Data Center 3.x

Cause

A file got rotated, cleaned or manually deleted but the process never closed it, so the file is still using disk space. These types of files are typically only identifiable via the lsof command.

Resolution

This is a condition that may occur in a VMware NSX environment.

Workaround:

Reboot the affected NSX manager node to resolve the issue.

Additional Information