With root access, if inodes are checked, it will show none are available:
On a healthy system:
On an affected Edge:
In /var/log/syslog* you see lines similar to:
2022-04-22T12:18:02.166Z S3BPKS2BME03 NSX 4752 - [nsx@6876 comp="nsx-edge s2comp="nsx-net" tid="4988" level="WARNING"] StreamConnection[56065 Connecting to unix:///var/run/vmware/nestdb/nestdb-server.sock sid:56065] Couldn't connect to 'unix:///var/run/vmware/nestdb/nestdb-server.sock' (error: 111-Connection refused)
2022-04-22T12:18:02.166Z S3BPKS2BME03 NSX 4752 - [nsx@6876 comp="nsx-edge s2comp="nsx-net" tid="4988" level="WARNING"] StreamConnection[56065 Error to unix:///var/run/vmware/nestdb/nestdb-server.sock sid:-1] Error 111-Connection refused
2022-04-22T12:18:02.166Z S3BPKS2BME03 NSX 4752 - [nsx@6876 comp="nsx-edge s2comp="nsx-rpc" tid="4988" level="WARNING"] RpcConnection[56065 Connecting to unix:///var/run/vmware/nestdb/nestdb-server.sock 0] Couldn't connect to unix:///var/run/vmware/nestdb/nestdb-server.sock (error: 111-Connection refused)
2022-04-22T12:18:02.166Z S3BPKS2BME03 NSX 4752 - [nsx@6876 comp="nsx-edge s2comp="nsx-rpc" tid="4988" level="WARNING"] RpcTransport[2] Unable to connect to unix:///var/run/vmware/nestdb/nestdb-server.sock: 111-Connection refused
2022-04-22T12:18:02.166Z S3BPKS2BME03 NSX 4752 - [nsx@6876 comp="nsx-edge s2comp="nestdb-client" tid="4989" level="WARNING"] NestDbClient: failed to get stub to unix:///var/run/vmware/nestdb/nestdb-server.sock, retrying in 5000 ms…
In /var/log/syslog* you see lines similar to:
2022-04-26T10:28:19.204325-04:00 PC1PKS2BME01 NSX 3481 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="frr-config" username="frr" level="ERROR"] "Failed to execute: rc=1, out=Traceback (most recent call last):#012 File "/usr/lib/frr/frr-reload.py", line 1524, in <module>#012 with open(filename, 'w') as fh:#012OSError: [Errno 28] No space left on device: '/config/vmware/edge/frr/reload-ELSSSB.txt'#012, err=Command '['/usr/lib/frr/frr-reload.py', '--debug', '--reload', '/config/vmware/edge/frr/frrbasecfg.txt']' returned non-zero exit status 1"
In /var/log/rcpm/frr-reload.log you see lines similar:
2022-04-22 05:24:38,959 WARNING: frr-reload.py failed due to
b'% Nexthop interface cannot be Null0, reject or blackhole\nline 36: Failure to communicate[13] to staticd, line: ip route #.#.#.#/22 blackhole tag 4001 nexthop-vrf default\n\n% Nexthop interface cannot be Null0, reject or blackhole\nline 40: Failure to communicate[13] to staticd, line: ip route #.#.#.#/22 blackhole tag 4001 nexthop-vrf default\n\n' cmds on file /config/vmware/edge/frr/reload-XEZNZH.txt
Note: These log lines in /var/log/rcpm/frr-reload.log will appear in a healthy and working NSX-T Edge. Only if these lines appear in combination with a large number of files in /config/vmware/edge/frr, the Edge is out of inodes, and the nestdb service is not running should you run through the Resolution below.
In /var/log/rcpm/frr-config.log*
2022-04-26T14:35:50Z PC1PKS2BME01 NSX 3481 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="frr-config" username="frr" level="INFO"] "Reading the routing proto from file /var/run/vmware/edge/routing-pb.cfg"
2022-04-26T14:35:51Z PC1PKS2BME01 NSX 3481 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="frr-config" username="frr" level="INFO"] "Inter-SR routing is enabled"
2022-04-26T14:35:51Z PC1PKS2BME01 NSX 3481 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="frr-config" username="frr" level="ERROR"] "Unable to open FRR Config File. error(28): No space left on device"
2022-04-26T14:35:51Z PC1PKS2BME01 NSX 3481 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="frr-config" username="frr" level="ERROR"] "Failed to save the base FRR config, "
2022-04-26T14:35:51Z PC1PKS2BME01 NSX 3481 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="frr-config" username="frr" level="ERROR"] "Failed to copy/tar log files <type 'exceptions.IOError'> [Errno 28] No space left on device: '/config/vmware/edge/frr/frrproto.2022-04-26T10.35.51.624749.cfg'"
2022-04-26T14:35:51Z PC1PKS2BME01 NSX 3481 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="frr-config" username="frr" level="ERROR"] "Failed to remove file /config/vmware/edge/frr/frrproto.2022-04-26T10.35.51.624749.cfg error <type 'exceptions.OSError'>"
2022-04-26T14:35:51Z PC1PKS2BME01 NSX 3481 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="frr-config" username="frr" level="ERROR"] "Error in applying the config to FRR"
VMware NSX-T Data Center 3.x
Do not copy and paste the above line into the CLI of the Edge. HTML to text translation may alter the text and make the command non-functional. Type the command manually.
Command output:
Note: These files must be deleted before upgrading to a newer version of NSX-T.
To verify there are a large number of files:
cd /config/vmware/edge
# du -hs *
4.0K config.json
4.0K dns
4.9G frr <--This is usually much smaller, measured in Kb.
32K ike
1.4M lb
4.0K mdproxy
8.0K rcpm
12K reverse-proxy
180K waf
To verify there is a large number of files in /config/vmware/edge/frr:
# ls -ltr /config/vmware/edge/frr/reload-* | wc -l
1250532
<----This is usually much smaller, in the range of a few thousand.
Note: This will always return the number of files inside this folder. A few thousand files is fine.
To verify if nestdb is running/not running
# get services nestdb
# get service nestdb
Service name: nestdb
Service state: stopped