An Alarm with Description " The disk usage for the Manager node disk partition / has reached 80% which is at or above the high threshold value of 80%" is seen on the NSX Manager web GUI
root@NSXTMGR:~# df -hFilesystem Size Used Avail Use% Mounted onudev 24G 0 24G 0% /devtmpfs 4.8G 1.6M 4.8G 1% /run/dev/sda2 11G 6.4G 3.3G 66% /tmpfs 24G 50M 24G 1% /dev/shmtmpfs 5.0M 0 5.0M 0% /run/locktmpfs 24G 0 24G 0% /sys/fs/cgroup/dev/sda1 942M 7.1M 870M 1% /boot/dev/sda3 11G 24K 9.7G 1% /os_bak/dev/mapper/nsx-config 29G 114M 28G 1% /config/dev/mapper/nsx-config__bak 29G 24K 28G 1% /config_bak/dev/mapper/nsx-image 42G 16G 25G 39% /image/dev/mapper/nsx-repository 31G 9.0G 21G 31% /repository/dev/mapper/nsx-secondary 98G 3.3G 90G 4% /nonconfig/dev/mapper/nsx-tmp 3.7G 14M 3.5G 1% /tmp/dev/mapper/nsx-var+dump 9.3G 24K 8.8G 1% /var/dump/dev/mapper/nsx-var+log 27G 24G 2.3G 92% /var/log ----------> Using almost 24GB out of 27GB tmpfs 4.8G 0 4.8G 0% /run/user/1007
root@NSXTMGR:~# du -hsx /var/log/* | sort -rh | head -1514G /var/log/cloudnet <------------Cloudnet directory consuming more2.8G /var/log/journal1.1G /var/log/vmware1014M /var/log/corfu821M /var/log/proton502M /var/log/proxy475M /var/log/messaging-manager430M /var/log/async-replicator378M /var/log/search289M /var/log/idps-reporting286M /var/log/site-manager266M /var/log/cbm176M /var/log/corfu-nonconfig175M /var/log/cm-inventory150M /var/log/stats
We see nsx-ccp logs are not getting rotated as we don't see any .gz files
root@NSXTMGR:/var/log/cloudnet# ls -lthr -rw-r----- 1 nsx nsx 101M Jan 23 19:55 nsx-ccp-20250123-195552954.log-rw-r----- 1 nsx nsx 101M Jan 23 19:56 nsx-ccp-20250123-195602167.log-rw-r----- 1 nsx nsx 101M Jan 23 19:56 nsx-ccp-20250123-195611827.log-rw-r----- 1 nsx nsx 101M Jan 23 19:56 nsx-ccp-20250123-195621245.log-rw-r----- 1 nsx nsx 101M Jan 23 19:56 nsx-ccp-20250123-195631156.log-rw-r----- 1 nsx nsx 101M Jan 23 19:56 nsx-ccp-20250123-195641663.log-rw-r----- 1 nsx nsx 101M Jan 23 19:56 nsx-ccp-20250123-195651234.log-rw-r----- 1 nsx nsx 101M Jan 23 19:57 nsx-ccp-20250123-195700565.log-rw-r----- 1 nsx nsx 101M Jan 23 19:57 nsx-ccp-20250123-195710860.log-rw-r----- 1 nsx nsx 101M Jan 23 19:57 nsx-ccp-metrics-20250123-195720149.log-rw-r----- 1 nsx nsx 101M Jan 23 19:57 nsx-ccp-20250123-195721033.log-rw-r----- 1 nsx nsx 101M Jan 23 19:57 nsx-ccp-20250123-195730390.log-rw-r----- 1 nsx nsx 101M Jan 24 01:51 nsx-ccp-20250124-015129679.log-rw-r----- 1 nsx nsx 101M Jan 24 07:38 nsx-ccp-metrics-20250124-073820148.log-rw-r----- 1 nsx nsx 101M Jan 24 09:30 nsx-ccp-20250124-093010104.log-rw-r----- 1 nsx nsx 101M Jan 24 13:31 nsx-ccp-metrics-20250124-133120113.log-rw-r----- 1 nsx nsx 101M Jan 24 17:21 nsx-ccp-20250124-172148655.log-rw-r----- 1 nsx nsx 101M Jan 24 19:24 nsx-ccp-metrics-20250124-192420110.log-rw-r----- 1 nsx nsx 101M Jan 25 01:16 nsx-ccp-20250125-011652028.log-rw-r----- 1 nsx nsx 101M Jan 25 01:17 nsx-ccp-metrics-20250125-011720086.log-rw-r----- 1 nsx nsx 101M Jan 25 07:09 nsx-ccp-metrics-20250125-070920173.log-rw-r----- 1 nsx nsx 101M Jan 25 09:13 nsx-ccp-20250125-091335828.log-rw-r----- 1 nsx nsx 101M Jan 25 13:02 nsx-ccp-metrics-20250125-130220162.log-rw-r----- 1 nsx nsx 101M Jan 25 17:12 nsx-ccp-20250125-171216615.log-rw-r----- 1 nsx nsx 101M Jan 25 18:55 nsx-ccp-metrics-20250125-185520128.log-rw-r----- 1 nsx nsx 101M Jan 25 23:57 nsx-ccp-20250125-235722385.log-rw-r----- 1 nsx nsx 101M Jan 26 00:48 nsx-ccp-metrics-20250126-004820113.log---------rw-r----- 1 nsx nsx 101M Feb 6 03:03 nsx-ccp-20250206-030350197.log-rw-r----- 1 nsx nsx 101M Feb 6 03:28 nsx-ccp-metrics-20250206-032820152.log
VMware NSX
->Failure Exception in nsx-ccp.log:(var/log/cloudnet)
2025-02-06T00:39:51.503Z INFO CCP-######-9919-4fd6-9404-######:worker-1 NettyConnection 1512 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="ccp"] Closing NettyConnection NettyConnection(NettyChannel(local=10.#.#.#:1235, remote=10.#.#.#:39834), active=false)2025-02-06T00:39:52.506Z WARN CCP-######-9919-4fd6-9404-#####:boss-0 DefaultChannelPipeline 1512 An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files
#nsx-ccp.log is continuously flooding with below warning "Too many open files"
io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open filesio.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open filesio.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open filesio.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open filesio.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open filesio.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files
This is a known issue that is affecting the NSX versions below 4.2 and the issue is fixed in NSX 4.2 and above.
As a temporary workaround, we can restart the controller service.
Note: Please ensure that the old logs in the "/var/log/cloudnet" folder are cleaned up before restarting the service
root@NSXTMGR:~# /etc/init.d/nsx-ccp restart