VMware NSX 4.X is in use and Security Intelligence (formerly known as VMware NSX Intelligence).
Edge transport nodes report "Unknown" or "Down" on NSX Manager UI.
The environment has a large number of IP addresses configured in security groups.
Firewall apply times are low (approximately 1 second or less) in /var/log/syslog:
2025-03-27T09:44:14.970Z edge_node NSX 11419 FIREWALL [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="firewall" tname="dp-ipc18" level="INFO"] Firewall apply total: 741 msec wait/done 0/1
The dp-ipc service on the edge node goes into a blocked state for 32 seconds or more as per below log extract from /var/log/syslog on the edge node:
2025-03-27T09:43:01.989Z edge_node NSX 11419 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="urcu2" level="WARN"] blocked 32000 ms waiting for dp-ipc18 to quiesce
Similarly we can see the below exception on the edge node in /var/log/syslog:
"[nsx@6876 comp="nsx-edge" subcomp="nsx-sha" username="nsx-sha" level="INFO" s2comp="fork-executor-2"] Exception caught when running cmd ...(omitted)..."
NOTE: The preceding log excerpts are only examples. Date, time and environmental variables may vary depending on your environment.
VMware NSX 4.X
vDefend Firewall
This case is specific to environments which have a large number of IPs in groups and Security Intelligence (formerly known as VMware NSX Intelligence) is deployed.
The NSX SHA plugins: edge_fw_per_conn_monitor and tls_stats_monitor pull every 60 seconds from edge-appctl cmds "fw/show lr ruleset" and "fw/show tlsrulestat" which query firewall rules, stats and the parsing of IP addresses from groups.
Due to the large number of IPs this takes significant time to compute and causes dp-ipc thread to remain busy and cannot respond to the edge health query from MP.
To check the IPs runt he following command from root of the edge CLI session: /opt/vmware/nsx-nestdb/bin/nestdb-cli --beautify --cmd "get vmware.nsx.nestdb.ContainerMsg" > /tmp/groups.txt
This command has "ip_address" in the output represented as a security group and lists each IP address from the group. This can be used to identify which groups have the largest amount of IP addresses and can be used to obtain the total number of IP address on that edge node.
To work around this issue, contact VMware Support and note this Article ID (397114) in the problem description.