NSX Transport Nodes ( Edge/ESXi host) show "Down/Degraded" due to DNS issue

VMware NSX

The status of the ESXi transport node appears as Degraded
The Node Status of the Edges is displayed as Down
You may observe the following alarm related to a failed reverse DNS lookup for the Manager node configuration
Reverse DNS lookup failed for Manager node ######## with IP address ###### and the publish_fqdns flag was set

You may also observe an alarm indicating that the transport node’s control plane connection to the Manager node is down
The Transport node ###### control plane connection to Manager node ##### is down for atleast 3 minutes from the Transport node's point of view.

The get controllers command on ESXi or Edge transport nodes may show no output
# get controllersController IP Port SSL Status Is Physical Master Session State Controller FQDN Failure Reason

Or, the status appears as Disconnected with the failure reason listed as Maintenance Mode

# get controllers Controller IP Port SSL Status Is Physical Master Session State Controller FQDN Failure Reason x.x.x.x 1235 enabled not used false null xxxxxxxxxxxxxxxxxxxxxxxxxxxx MAINTAINANCE_MODE x.x.x.x 1235 enabled disconnected true down xxxxxxxxxxxxxxxxxxxxxxxxxxxx MAINTAINANCE_MODE x.x.x.x 1235 enabled not used false null xxxxxxxxxxxxxxxxxxxxxxxxxxxx MAINTAINANCE_MODE

The logs below appear in /var/log/proton/nsx-api.log on the NSX Manager

423349 2025-07-19T10:20:45.672Z ERROR workerTaskExecutor-1-45 ControllerUtils 5094 FABRIC [nsx@6876 comp="nsx-manager" errorCode="MP2119" level="ERROR" subcomp="manager"] Not sending ControllerInfoMsg for controller ClusterNodeConfigModel/######-####-####-####-####### as reverse DNS lookup for its IP <Edge's IP> failed423350 2025-07-19T10:20:45.674Z INFO workerTaskExecutor-1-45 DnsLookupProviderImpl 5094 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] No cached value for key: <Edge's IP> in fqdnToIpMap/ipToFqdnMap, will try to get data from IpAddressUtils423351 2025-07-19T10:20:45.674Z INFO workerTaskExecutor-1-45 Utils 5094 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] getFqdnFromIp(): invoked with Ip Address <Edge's IP>

On some Edge devices, the controller info file (/etc/vmware/nsx/controller-info.xml) may be empty

The issue occurs when DNS is either misconfigured or unavailable due to a lookup outage. Since this condition is not properly handled in NSX, it results in the loss of controller connectivity.
When DNS lookup fails, NSX continues to send controller messages; however, the message contents remain unpopulated (skipped) because of the failed DNS lookup.

Verify and Fix DNS
- Resolve any DNS issues in the environment
- Ensure that both forward and reverse DNS lookups are properly configured on the DNS server
Post resolving DNS issue

a. Recover ESXi Transport Nodes
- Restart the management services on all affected ESXi transport nodes to bring them out of the degraded state:
  service.sh restart
b. Recover Edge Nodes
1. Restart the local controller service on the standby Edge node:
  restart local-controller
2. Place the Edge node in maintenance mode for a few seconds.
3. Exit maintenance mode.
4. Wait until the Edge node status returns to Healthy.
5. For an Edge node with an empty controller-info.xml file (/etc/vmware/nsx/controller-info.xml), copy the file from another healthy Edge node and place it in the same directory.
6. Restart the NSX proxy service on the affected Edge node:
  /etc/init.d/nsx-proxy restart
7. Wait until all nodes report a Healthy state.

If the above steps in this KB do not resolve the issue, raise a support ticket with Broadcom support selecting NSX as the product.

Please refer to the below Kb with similar issue :
https://knowledge.broadcom.com/external/article/424895

Handling Log Bundles for offline review with Broadcom support.