NSX-T Edge node crashes with the following "[WARN] unix:/var/run/vmware/edge/dpd.ctl: receive error: Connection reset by peer"
search cancel

NSX-T Edge node crashes with the following "[WARN] unix:/var/run/vmware/edge/dpd.ctl: receive error: Connection reset by peer"

book

Article ID: 345835

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • You are running NSX-T 3.2.0.x, 3.2.1.x, 3.2.2, 4.0.x.
  • You are using DFW rules and some of the rule have applied to on a group which contains overlay segments with logical router switchports.
  • On the edge node when you run the command get logical-routers you receive the following WARN alert:
An unexpected error occurred: <date-time> edge-appctl 18819 jsonrpc [WARN] unix:/var/run/vmware/edge/dpd.ctl: receive error: Connection reset by peer
  • ssh to the edge node is working.
  • The datapath service is stopped in the edge node:
>get service dataplane
<date-time>
Service name:      dataplane
Service state:     stopped
  • In the edge log /var/log/kern.log we see the following:
datapathd[32578]: segfault at 8 ip <hex-address> sp <hex-uuid> error 4 in datapathd[<hex-address>+15ed000]
  • On the Edge /var/log/syslog:

NSXT-E1 NSX 5266 FIREWALL [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="firewallcp" level="ERROR"] DfwChannel: Failed to update dfw cache due to exception: too many TCP/UDP port: 16

NSXT-E1 datapath-systemd-helper 5197 - - <date-time-1> datapathd 5266 firewallcp [ERROR] DfwChannel: Failed to update dfw cache due to exception: too many TCP/UDP port: 16

NSXT-E1 95ddcdc5d374 3459 - - <date-time-2> datapathd 5266 firewallcp [ERROR] DfwChannel: Failed to update dfw cache due to exception: too many TCP/UDP port: 16

...

NSXT-E1 NSX 3548 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="INFO"] Service datapathd coredump at <date-time-3> file /var/log/core/core.datapathd.<epoch-time>.20851.0.9.gz

  • Core dumps for the dataplane service are seen in the edge node /var/log/core/:
core.datapathd.<epoch-time>.20851.0.9.gz


Environment

VMware NSX-T Data Center

Cause

CCP (Central Control Plane) computed Downlink port as part of FW Rule's span when Logical switch / LSP is used in Rule's appliedTo. As a result, DFW Rule is sent to EdgeNode in error.
If one of the rules pushed to Edge has the wrong parameters, it may result in a perpetual dataplane crash.

Resolution

In NSX-T 4.1.0 and 3.2.3, the issue where a DFW is incorrectly pushed to the edge node when the group with the logical router switchport is a member, is part of the applied to field, is resolved.

In NSX-T 3.2.2 validation is implemented to prevent more than 15 logical switchports being added and will result in an alert similar to:
"Number of values (ranges count as 2 values) in a source/destination ports {port count}. It should not exceed 15."


Workaround:
  • To avoid the Edge data path crash, verify that no FW rules contain more than 15 ports.
  • For example: If the range 1-3 is specified, the rule has 2 ports. Divide the rule into multiple rules if the rule requires more than 15 ports.


Additional Information

Impact/Risks:

The dataplane on the edge node is not functional and therefore will impact services.