<DATE>T17:15:10.555Z Edge NSX 9226 - [nsx@6876 comp="nsx-edge" subcomp="nsx-sha" username="nsx-sha" level="CRITICAL" eventFeatureName="communication" eventType="manager_fqdn_lookup_failure" eventSev="critical" eventState="On"] DNS lookup failed for Manager node <UUID> with FQDN <MANAGER_FQDN> and the publish_fqdns flag was set.
<DATE>T17:55:19.212Z Edge NSX 9226 - [nsx@6876 comp="nsx-edge" subcomp="nsx-sha" username="nsx-sha" level="WARNING" s2comp="metric-collector"] Metric nsx.communication.manager-fqdn-lookup-failure-status last execution <Future at hex_value state=running> not complete, running for 3832996.569242999 secs<DATE>T17:56:19.273Z Edge NSX 9226 - [nsx@6876 comp="nsx-edge" subcomp="nsx-sha" username="nsx-sha" level="WARNING" s2comp="metric-collector"] Metric nsx.communication.manager-fqdn-lookup-failure-status last execution <Future at
hex_value
state=running> not complete, running for 3833056.6301390156 secs<DATE>T17:57:19.341Z Edge NSX 9226 - [nsx@6876 comp="nsx-edge" subcomp="nsx-sha" username="nsx-sha" level="WARNING" s2comp="metric-collector"] Metric nsx.communication.manager-fqdn-lookup-failure-status last execution <Future at
hex_value
state=running> not complete, running for 3833116.698533103 secs
# systemctl --no-pager status nsx-sha
├─nsx-sha.service
│ ├─ 1095 /bin/sh /opt/vmware/nsx-netopa/bin/sha_watchdog.sh -s nsx-sha -q 100 -t 1000 -b /var/run/vmware/nsx-sha/watchdog-nsx-sha.BG.PID /opt/vmware/nsx-netopa/bin/nsx-sha /var/run/vmware/nsx-sha/watchdog-nsx-sha.BG.PID
│ ├─ 2220 sleep 1
│ ├─ 9226 /opt/vmware/nsx-netopa/libexec/python-3/bin/python3 /opt/vmware/nsx-netopa/bin/agent.py
│ ├─ 9228 /opt/vmware/nsx-netopa/libexec/python-3/bin/python3 /opt/vmware/nsx-netopa/bin/agent.py
│ ├─ 9086 sudo /usr/bin/host <MANAGER_FQDN>
│ └─ 9087 /usr/bin/host <MANAGER_FQDN>
nslookup
and dig
from the root shell of the Edge correctly resolves the NSX Manager IP from the fqdn and vice versaVMware NSX 4.x
VMware NSX-T 3.x
The metrics script manager_fqdn_lookup_failure_status.py which runs on the Edge is stuck and cannot complete. The logs show the running time incrementing each minute.
This issue is resolved in VMware NSX 3.2.3 available at Broadcom Downloads.
To workaround this issue reboot the Edge
1. Place the Edge in maintenance mode
System -> Fabric -> Nodes, select the Edge and then Actions -> Enter NSX Maintenance Mode
2. Reboot the Edge
3. Exit maintenance mode
select the Edge and then Actions -> Exit NSX Maintenance Mode