Symptoms:
[nsx@6876 comp="nsx-edge" subcomp="nsx-nestdb" s2comp="nsx-net" tid="1835" level="ERROR" errorCode="NET4"] NetTransport[0] Accept on endpoint 'unix:///var/run/vmware/nestdb/nestdb-server.sock' failed with error 24-Too many open files
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1634 nestdb 20 0 584372 523472 14388 S 96.4 1.6 339:59.99 nestdb-server
root@edge-node:/tmp# lsof +c 0 | awk '{ print $2 " " $1; }' | sort -rn | uniq -c | sort -rn | head -20
150063 1634 nestdb-server
100128 3942 nvpapi.py
5888 2365 python3
5024 7288 datapathd
VMware NSX-T Data Center 3.x
VMware NSX-T Data Center
The NSX Manager pushes the collector configuration, for example from vRNI, to the Edge nodes.
After upgrading the NSX-T Edge node to 3.1.3, the NSX-T Edge node expects 3 pieces on information about the collector; IP, Port Number and Type.
However the Manager Node prior to upgrade only sends two pieces of information: IP and Port Number.
Due to this missing piece of information, the NSX-T Edge node will continuously retries RPC connections, each failure results in a file open and thus leading to this file open exhaustion issue.
The below API call can be used to verify the Collector information:
curl -i -k -u 'admin:<PW>' -H "Content-Type:application/json" -X GET https://<nsx-mgr-ip>/api/v1/global-configs/OperationCollectorGlobalConfig
This issue is resolved in NSX-T Data Center 3.1.3, once the NSX-T management plane is completely upgraded, you will need to restart the service:
This is in order to clear any open file descriptors that may have accumulated.
Workaround:
If there is a long gap between the NSX-T edge node upgrade and NSX-T manager node upgrade and you encounter this issue, disable the collector configuration, this can be done via the log collection utility for example within vRNI and execute the below command on the impacted NSX-T Edge node while logged in as root:
service nsx-edge-api-server restart
Alternatively, before you start the NSX-T upgrade, you can clear the collector information from using the API above and remove the collector information and post NSX-T is completely upgraded, you can re-apply the collector configuration again.