Alarm on NSX Manager "Manager Disk usage High"

Products

VMware NSX

Issue/Introduction

An Alarm with Description " The disk usage for the Manager node disk partition / has reached 80% which is at or above the high threshold value of 80%" is seen on the NSX Manager web GUI

root@NSXTMGR:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 24G 0 24G 0% /dev
tmpfs 4.8G 1.6M 4.8G 1% /run
/dev/sda2 11G 6.4G 3.3G 66% /
tmpfs 24G 50M 24G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 24G 0 24G 0% /sys/fs/cgroup
/dev/sda1 942M 7.1M 870M 1% /boot
/dev/sda3 11G 24K 9.7G 1% /os_bak
/dev/mapper/nsx-config 29G 114M 28G 1% /config
/dev/mapper/nsx-config__bak 29G 24K 28G 1% /config_bak
/dev/mapper/nsx-image 42G 16G 25G 39% /image
/dev/mapper/nsx-repository 31G 9.0G 21G 31% /repository
/dev/mapper/nsx-secondary 98G 3.3G 90G 4% /nonconfig
/dev/mapper/nsx-tmp 3.7G 14M 3.5G 1% /tmp
/dev/mapper/nsx-var+dump 9.3G 24K 8.8G 1% /var/dump
/dev/mapper/nsx-var+log 27G 24G 2.3G 92% /var/log ----------> Using almost 24GB out of 27GB
tmpfs 4.8G 0 4.8G 0% /run/user/1007

root@NSXTMGR:~# du -hsx /var/log/* | sort -rh | head -15
14G /var/log/cloudnet <------------Cloudnet directory consuming more
2.8G /var/log/journal
1.1G /var/log/vmware
1014M /var/log/corfu
821M /var/log/proton
502M /var/log/proxy
475M /var/log/messaging-manager
430M /var/log/async-replicator
378M /var/log/search
289M /var/log/idps-reporting
286M /var/log/site-manager
266M /var/log/cbm
176M /var/log/corfu-nonconfig
175M /var/log/cm-inventory
150M /var/log/stats

We see nsx-ccp logs are not getting rotated as we don't see any .gz files

root@NSXTMGR:/var/log/cloudnet# ls -lthr
-rw-r----- 1 nsx nsx 101M Jan 23 19:55 nsx-ccp-20250123-195552954.log
-rw-r----- 1 nsx nsx 101M Jan 23 19:56 nsx-ccp-20250123-195602167.log
-rw-r----- 1 nsx nsx 101M Jan 23 19:56 nsx-ccp-20250123-195611827.log
-rw-r----- 1 nsx nsx 101M Jan 23 19:56 nsx-ccp-20250123-195621245.log
-rw-r----- 1 nsx nsx 101M Jan 23 19:56 nsx-ccp-20250123-195631156.log
-rw-r----- 1 nsx nsx 101M Jan 23 19:56 nsx-ccp-20250123-195641663.log
-rw-r----- 1 nsx nsx 101M Jan 23 19:56 nsx-ccp-20250123-195651234.log
-rw-r----- 1 nsx nsx 101M Jan 23 19:57 nsx-ccp-20250123-195700565.log
-rw-r----- 1 nsx nsx 101M Jan 23 19:57 nsx-ccp-20250123-195710860.log
-rw-r----- 1 nsx nsx 101M Jan 23 19:57 nsx-ccp-metrics-20250123-195720149.log
-rw-r----- 1 nsx nsx 101M Jan 23 19:57 nsx-ccp-20250123-195721033.log
-rw-r----- 1 nsx nsx 101M Jan 23 19:57 nsx-ccp-20250123-195730390.log
-rw-r----- 1 nsx nsx 101M Jan 24 01:51 nsx-ccp-20250124-015129679.log
-rw-r----- 1 nsx nsx 101M Jan 24 07:38 nsx-ccp-metrics-20250124-073820148.log
-rw-r----- 1 nsx nsx 101M Jan 24 09:30 nsx-ccp-20250124-093010104.log
-rw-r----- 1 nsx nsx 101M Jan 24 13:31 nsx-ccp-metrics-20250124-133120113.log
-rw-r----- 1 nsx nsx 101M Jan 24 17:21 nsx-ccp-20250124-172148655.log
-rw-r----- 1 nsx nsx 101M Jan 24 19:24 nsx-ccp-metrics-20250124-192420110.log
-rw-r----- 1 nsx nsx 101M Jan 25 01:16 nsx-ccp-20250125-011652028.log
-rw-r----- 1 nsx nsx 101M Jan 25 01:17 nsx-ccp-metrics-20250125-011720086.log
-rw-r----- 1 nsx nsx 101M Jan 25 07:09 nsx-ccp-metrics-20250125-070920173.log
-rw-r----- 1 nsx nsx 101M Jan 25 09:13 nsx-ccp-20250125-091335828.log
-rw-r----- 1 nsx nsx 101M Jan 25 13:02 nsx-ccp-metrics-20250125-130220162.log
-rw-r----- 1 nsx nsx 101M Jan 25 17:12 nsx-ccp-20250125-171216615.log
-rw-r----- 1 nsx nsx 101M Jan 25 18:55 nsx-ccp-metrics-20250125-185520128.log
-rw-r----- 1 nsx nsx 101M Jan 25 23:57 nsx-ccp-20250125-235722385.log
-rw-r----- 1 nsx nsx 101M Jan 26 00:48 nsx-ccp-metrics-20250126-004820113.log
----
----
-rw-r----- 1 nsx nsx 101M Feb 6 03:03 nsx-ccp-20250206-030350197.log
-rw-r----- 1 nsx nsx 101M Feb 6 03:28 nsx-ccp-metrics-20250206-032820152.log

Environment

VMware NSX

Cause

->Failure Exception in nsx-ccp.log:(var/log/cloudnet)

2025-02-06T00:39:51.503Z INFO CCP-######-9919-4fd6-9404-######:worker-1 NettyConnection 1512 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="ccp"] Closing NettyConnection NettyConnection(NettyChannel(local=10.#.#.#:1235, remote=10.#.#.#:39834), active=false)
2025-02-06T00:39:52.506Z WARN CCP-######-9919-4fd6-9404-#####:boss-0 DefaultChannelPipeline 1512 An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files

#nsx-ccp.log is continuously flooding with below warning "Too many open files"

io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files
io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files
io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files
io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files
io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files
io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files

Resolution

This is a known issue that is affecting the NSX versions below 4.2 and the issue is fixed in NSX 4.2 and above.

As a temporary workaround, we can restart the controller service.

Note: Please ensure that the old logs in the "/var/log/cloudnet" folder are cleaned up before restarting the service

root@NSXTMGR:~# /etc/init.d/nsx-ccp restart

Additional Information

NSX Manager のアラーム「マネージャのディスク使用量が高い」