Edge Transport node is in 'Failed' state in NSX UI
search cancel

Edge Transport node is in 'Failed' state in NSX UI

book

Article ID: 396722

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Edge Transport node is in 'Failed' state in NSX UI

Communication issues with vm's residing inside NSX

Environment

VMware NSX T 4.1.2

Cause

Multiple instances of NestDB are started. This causes unpredictable behavior from the perspective of the NestDB clients, as some clients operate on one instance while other clients operate on another.

The NestDB server startup script, like many other LCP daemons, uses pidof to determine if the process has been started. If it does not detect that the process has started, the startup script launches another instance of the watchdog, which in turn attempts to launch another instance of NestDB.

This works fine under normal circumstances, but pidof does *not* return processes that are in the uninterruptible sleep state (D) or the zombie state (Z) by default on some linux distributions, including Ubuntu 20.04 (Ubuntu version on this Edge VM).

An example of logging in  wherein NestDB is in an uninterruptable sleep state is below:

var/log/vmware/top-cpu.log:

Tue Sep 05 16:22:17 UTC 2025
PID   USER    PR  NI    VIRT    RES      SHR    S  %CPU  %MEM     TIME+    TGID COMMAND
2##2 nestdb   20   0   83212  24180  14576  D  16.5   0.1   0:00.17    2092 /opt/vmware/nsx-nestdb/bin/nestdb-server --schema /opt/vmware/nsx-nestdb/schema/nestdb.schema --dat+

Please reference Manpages for ubuntu pidof8 or Why is pidof not working for further context.
This is not done because it can cause pidof and calling scripts to hang in such cases.

 

Resolution

Fixed in NSX 4.2.0

 

Workaround:

There is no workaround to avoid the issue, but the risk can be avoided by ensuring a healthy infra/disk.

To recover, the corresponding Edge can be rebooted.

Additional Information

Some log entries found on the affected edge node in Syslog:

2025-04-24T14:39:39.017Z edgenode NSX 1 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="nestdb" level="ERROR" errorCode="EDG0000057"] DB is not connected while performing write operation
2025-04-24T14:39:39.004Z edgenode nsxa-systemd-helper 7467 - -  2025-04-24T14:39:39Z nsxa 1 nestdb [ERROR] DB is not connected while performing write operation  errorCode="EDG0000057"
2025-04-24T14:39:39.164Z edgenode nsxa-systemd-helper 7467 - -  2025-04-24T14:39:39Z nsxa 1 nestdb [ERROR] DB is not connected while performing write operation  errorCode="EDG0000057"