get controllers command run on the affected edge node produces the error, % Failed to get controller listHost configuration: Caught MessagingException during host config stage. [TN=TransportNode/c5a965c6-####-####-####-17ad46d9b83c]. Reason: MessagingException2026-01-07T00:00:46.024Z ERROR L2HostConfigTaskExecutor5 TransportNodeAsyncServiceImpl 5143 FABRIC [nsx@6876 comp="nsx-manager" errorCode="MP100" level="ERROR" subcomp="manager"] Caught MessagingException during host config stage. [TN=TransportNode/c5a965c6-####-####-####-17ad46d9b83c]. Reason: MessagingExceptioncom.vmware.nsx.messaging.exceptions.MessagingException: nullat com.vmware.nsx.messaging.rpc.RpcManager.invokeOutgoingRequestTimeoutErrorHandler(RpcManager.java:609) ~[?:?]at com.vmware.nsx.messaging.rpc.RpcManager.access$700(RpcManager.java:66) ~[?:?]at com.vmware.nsx.messaging.rpc.RpcManager$RequestMapsCleanupTask.runCleanup(RpcManager.java:1026) ~[?:?]at com.vmware.nsx.messaging.rpc.RpcManager$RequestMapsCleanupTask.run(RpcManager.java:993) ~[?:?]at java.util.TimerThread.mainLoop(Timer.java:555) ~[?:1.8.0_382]at java.util.TimerThread.run(Timer.java:505) ~[?:1.8.0_382]2026-01-07T00:00:46.024Z INFO L2HostConfigTaskExecutor5 TransportNodeStateServiceImpl 5143 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Incoming Revision: [1024] Obj: [TnStateInternal [id=c5a965c6-####-####-####-17ad46d9b83c, retryCount=0, vmkMigrationFailures=0, revision=1024, stageToStatusMap={HostConfig=TnStageStatus [stageName=HostConfig, status=FAILED, errorCode=8816, errorParams=[c5a965c6-####-####-####-17ad46d9b83c, MessagingException], timeStamp=2026-Jan-07 00.00.46 AM, errorMessage=Caught MessagingException during host config stage. [TN=TransportNode/c5a965c6-####-####-####-17ad46d9b83c]. Reason: MessagingException]}]]2025-04-24T14:39:39.017Z ######## NSX 1 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="nestdb" level="ERROR" errorCode="########"] DB is not connected while performing write operation2025-04-24T14:39:39.004Z ######## nsxa-systemd-helper 7467 - - 2025-04-24T14:39:39Z nsxa 1 nestdb [ERROR] DB is not connected while performing write operation errorCode="########"2025-04-24T14:39:39.164Z ######## nsxa-systemd-helper 7467 - - 2025-04-24T14:39:39Z nsxa 1 nestdb [ERROR] DB is not connected while performing write operation errorCode="########"2026-01-07T10:03:46.392Z ######## NSX 1 - [nsx@6876 comp="nsx-edge" s2comp="nsx-net" tid="11" level="INFO"] StreamSocket[1241234 Open f:28 i:1134202767 -> unix:///var/run/vmware/nestdb/nestdb-server.sock] async_connect2026-01-07T10:03:46.392Z ######## NSX 1 - [nsx@6876 comp="nsx-edge" s2comp="nsx-net" tid="11" level="INFO"] StreamSocket[1241234 Open f:28 i:1134202767 -> unix:///var/run/vmware/nestdb/nestdb-server.sock] on_connect 2-No such file or directory2026-01-07T10:03:46.392Z ######## NSX 1 - [nsx@6876 comp="nsx-edge" s2comp="nsx-net" tid="11" level="WARNING"] StreamConnection[1241234 Connecting to unix:///var/run/vmware/nestdb/nestdb-server.sock sid:1241234] Couldn't connect to 'unix:///var/run/vmware/nestdb/nestdb-server.sock' (error: 2-No such file or directory)2026-01-07T10:03:46.392Z ######## NSX 1 - [nsx@6876 comp="nsx-edge" s2comp="nsx-net" tid="11" level="WARNING"] StreamConnection[1241234 Error to unix:///var/run/vmware/nestdb/nestdb-server.sock sid:-1] Error 2-No such file or directory2026-01-07T10:03:46.392Z ######## NSX 1 - [nsx@6876 comp="nsx-edge" s2comp="nsx-rpc" tid="11" level="WARNING"] RpcConnection[1241234 Connecting to unix:///var/run/vmware/nestdb/nestdb-server.sock 0] Couldn't connect to unix:///var/run/vmware/nestdb/nestdb-server.sock (error: 2-No such file or directory)2026-01-07T10:03:46.393Z ######## NSX 3057 - [nsx@6876 comp="nsx-edge" s2comp="nsx-net" tid="3221" level="INFO"] StreamSocket[1241287 Init f:-1 i:-1 -> unix:///var/run/vmware/nestdb/nestdb-server.sock] Created
# ps -ef |grep nestdb |grep -v watchdog3510 3491 994 nestdb 00:06:26 0.0 0.7 237104 293796 /opt/vmware/nsx-nestdb/bin/nestdb-server --schema /opt/vmware/nsx-nestdb/schema/nestdb.schema --database /config/vmware/nsx/nestdb/db --txn_log_size 209715200 --mem_stats_interval 300 --mem_release_interval 86400 --metrics_text_publisher --metrics_rpc_publisher --listen unix:///var/run/vmware/nestdb/nestdb-server.sock --listen ssl-unix:///var/run/vmware/nestdb/nestdb-server-ssl.sock1156706 3616 994 nestdb 00:00:00 1.9 0.1 37400 99232 /opt/vmware/nsx-nestdb/bin/nestdb-server --schema /opt/vmware/nsx-nestdb/schema/nestdb.schema --database /config/vmware/nsx/nestdb/db --txn_log_size 209715200 --mem_stats_interval 300 --mem_release_interval 86400 --metrics_text_publisher --metrics_rpc_publisher --listen unix:///var/run/vmware/nestdb/nestdb-server.sock --listen ssl-unix:///var/run/vmware/nestdb/nestdb-server-ssl.sock1156720 3193 994 nestdb 00:00:00 2.1 0.1 37376 99232 /opt/vmware/nsx-nestdb/bin/nestdb-server --schema /opt/vmware/nsx-nestdb/schema/nestdb.schema --database /config/vmware/nsx/nestdb/db --txn_log_size 209715200 --mem_stats_interval 300 --mem_release_interval 86400 --metrics_text_publisher --metrics_rpc_publisher --listen unix:///var/run/vmware/nestdb/nestdb-server.sock --listen ssl-VMware NSX
Multiple instances of NestDB are started. This causes unpredictable behavior from the perspective of the NestDB clients, as some clients operate on one instance while other clients operate on another.
The NestDB server startup script, like many other LCP daemons, uses pidof to determine if the process has been started. If it does not detect that the process has started, the startup script launches another instance of the watchdog, which in turn attempts to launch another instance of NestDB.
This works fine under normal circumstances, but pidof does *not* return processes that are in the uninterruptible sleep state (D) or the zombie state (Z) by default on some linux distributions, including Ubuntu 20.04 (Ubuntu version on this Edge VM).
An example of logging in wherein NestDB is in an uninterruptable sleep state is below:
var/log/vmware/top-cpu.log:
Tue Sep 05 16:22:17 UTC 2025PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ TGID COMMAND2##2 nestdb 20 0 83212 24180 14576 D 16.5 0.1 0:00.17 2092 /opt/vmware/nsx-nestdb/bin/nestdb-server --schema /opt/vmware/nsx-nestdb/schema/nestdb.schema --dat+
Please reference Manpages for ubuntu pidof8 or Why is pidof not working for further context.
This is not done because it can cause pidof and calling scripts to hang in such cases.
This issue is resolved in VMware NSX 4.2.0 available at Broadcom Downloads.
Workaround:
To workaround this issue, the affected edge node can be rebooted.
The risk can be avoided by ensuring a healthy infra/disk.