Edge Nodes Status Down or Unknown in NSX Manager UI when Security Intelligence (formerly known as VMware NSX Intelligence) is in use due to dp-ipc block
search cancel

Edge Nodes Status Down or Unknown in NSX Manager UI when Security Intelligence (formerly known as VMware NSX Intelligence) is in use due to dp-ipc block

book

Article ID: 397114

calendar_today

Updated On:

Products

VMware vDefend Firewall

Issue/Introduction

VMware NSX 4.X is in use and Security Intelligence (formerly known as VMware NSX Intelligence).

Edge transport nodes report "Unknown" or "Down" on NSX Manager UI.

The environment has a large number of IP addresses configured in security groups.

Firewall apply times are low (approximately 1 second or less) in /var/log/syslog:

2025-03-27T09:44:14.970Z edge_node NSX 11419 FIREWALL [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="firewall" tname="dp-ipc18" level="INFO"] Firewall apply total: 741 msec wait/done 0/1

The dp-ipc service on the edge node goes into a blocked state for 32 seconds or more as per below log extract from /var/log/syslog on the edge node:

2025-03-27T09:43:01.989Z edge_node NSX 11419 SYSTEM [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="ovs-rcu" tname="urcu2" level="WARN"] blocked 32000 ms waiting for dp-ipc18 to quiesce

Similarly we can see the below exception on the edge node in /var/log/syslog:

"[nsx@6876 comp="nsx-edge" subcomp="nsx-sha" username="nsx-sha" level="INFO" s2comp="fork-executor-2"] Exception caught when running cmd ...(omitted)..."

NOTE: The preceding log excerpts are only examples. Date, time and environmental variables may vary depending on your environment.

Environment

VMware NSX 4.X

vDefend Firewall

Cause

This case is specific to environments which have a large number of IPs in groups and Security Intelligence (formerly known as VMware NSX Intelligence) is deployed.

The NSX SHA plugins: edge_fw_per_conn_monitor  and tls_stats_monitor  pull every 60 seconds from edge-appctl cmds "fw/show lr ruleset" and "fw/show tlsrulestat" which query firewall rules, stats and the parsing of IP addresses from groups. 

Due to the large number of IPs this takes significant time to compute and causes dp-ipc thread to remain busy and cannot respond to the edge health query from MP.

To check the IPs runt he following command from root of the edge CLI session: /opt/vmware/nsx-nestdb/bin/nestdb-cli --beautify --cmd "get vmware.nsx.nestdb.ContainerMsg" > /tmp/groups.txt

This command has "ip_address" in the output represented as a security group and lists each IP address from the group. This can be used to identify which groups have the largest amount of IP addresses and can be used to obtain the total number of IP address on that edge node.

Resolution

To work around this issue, contact VMware Support and note this Article ID (397114) in the problem description.