[nsx@6876 comp="nsx-edge" subcomp="nsx-nestdb" s2comp="nsx-net" tid="1835" level="ERROR" errorCode="NET4"] NetTransport[0] Accept on endpoint 'unix:///var/run/vmware/nestdb/nestdb-server.sock' failed with error 24-Too many open files
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1634 nestdb 20 0 584372 523472 14388 S 96.4 1.6 339:59.99 nestdb-server
root@edge-node:/tmp# lsof +c 0 | awk '{ print $2 " " $1; }' | sort -rn | uniq -c | sort -rn | head -20
150063 1634 nestdb-server
100128 3942 nvpapi.py
5888 2365 python3
5024 7288 datapathd
The NSX Manager pushes the collector configuration, for example from vRNI, to the Edge nodes.
After upgrading the NSX-T Edge node to 3.1.3, the NSX-T Edge node expects 3 pieces on information about the collector; IP, Port Number and Type.
However the Manager Node prior to upgrade only sends two pieces of information: IP and Port Number.
Due to this missing piece of information, the NSX-T Edge node will continuously retries RPC connections, each failure results in a file open and thus leading to this file open exhaustion issue.
The below API call can be used to verify the Collector information:
curl -i -k -u 'admin:<PW>' -H "Content-Type:application/json" -X GET https://<nsx-mgr-ip>/api/v1/global-configs/OperationCollectorGlobalConfig
If there is a long gap between the NSX-T edge node upgrade and NSX-T manager node upgrade and you encounter this issue, disable the collector configuration, this can be done via the log collection utility for example within vRNI and execute the below command on the impacted NSX-T Edge node while logged in as root:
service nsx-edge-api-server restart
Alternatively, before you start the NSX-T upgrade, you can clear the collector information from use the API above and remove the collector information and after NSX-T is completely upgraded, you can re-apply the collector configuration again.