"Caught MessagingException during host config stage."xx--yy-zzT09:53:10.272Z NSX 3612 - [nsx@6876 comp="nsx-edge" s2comp="nsx-net" tid="3630" level="WARNING"] StreamConnection[12625 Connecting to unix:///var/run/vmware/nestdb/nestdb-server.sock 0] Couldn't connect to 'unix:///var/run/vmware/nestdb/nestdb-server.sock' (error: 2-No such file or directory)edge#### xx--yy-zzT09:53:10.272Z NSX 3314 - [nsx@6876 comp="nsx-edge" s2comp="nestdb-client" tid="3402" level="WARNING"] NestDbClient: failed to get stub to unix:///var/run/vmware/nestdb/nestdb-server.sock, retrying in 5000 ms...edge####
File "/opt/vmware/nsx-netopa/lib/python/sha/core/profile/_nestdb_client.py", line 277, in _setup_monitor NestdbConnCtl.start(self._name) File "/opt/vmware/nsx-netopa/lib/python/sha/core/profile/_nestdb_conn.py", line 88, in start cls._client.start(name) File "/opt/vmware/nsx-netopa/lib/python/sha/core/profile/_nestdb_conn.py", line 38, in start self._stub = NsxRpcClient(NestDb_Stub, self._endpoint) File "/opt/vmware/nsx-netopa/lib/python/vmware/nsx/rpc/client/client.py", line 105, in __init__ self._own_connection.Connect(endpoint, sec_ctx=sec_ctx, retry_policy=retry_policy) File "/opt/vmware/nsx-netopa/lib/python/vmware/nsx/rpc/client/transport.py", line 224, in Connect self._Connect(endpoint, sec_ctx) File "/opt/vmware/nsx-netopa/lib/python/vmware/nsx/rpc/client/transport.py", line 233, in _Connect self.sock.connect(path) File "/opt/vmware/nsx-netopa/lib/python/gevent/_socketcommon.py", line 590, in connect self._internal_connect(address) File "/opt/vmware/nsx-netopa/lib/python/gevent/_socketcommon.py", line 655, in _internal_connect raise _SocketError(result, strerror(result))FileNotFoundError: [Errno 2] No such file or directory
VMware NSX
This issue is caused by excessive latency on the underlying physical storage backing the Edge VM, preventing internal services from functioning correctly.
Diagnosis:
esxtop on the associated ESXi Host hosting the Edge Node.Impact: The high storage latency prevents the Edge's internal database (nestdb) from performing read/write operations within required timeout windows. This causes internal services to crash or stall, leading to the configuration failure and control plane disconnection.
The primary resolution requires addressing the storage performance at the infrastructure level. The NSX Edge failure is a symptom, not the root cause.
Validate Storage Health: Contact the Storage or Virtualization team to investigate the backend storage array for latency issues or high load.
Monitor DAVG: Ensure the Device Average Latency (DAVG) on the ESXi host drops to normal operating levels (typically < 10-20ms).
Automatic Recovery: Once storage latency normalizes, the NSX Edge services should automatically reconnect to the internal database and the Control Plane.