NSX-T Load balancer crashes regularly when it has a single port range configured.
search cancel

NSX-T Load balancer crashes regularly when it has a single port range configured.

book

Article ID: 312603

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
BGP peering goes down regularly (e.g. ~2 hours) for 5-10 minutes.
Load balancer crashes and generates core dumps. Segmentation fault errors are found in Edge's /var/log/kern.log:
2020-10-28T20:01:47.998432+00:00 edge01.your.domain kernel - - - [255223.841935] traps: dp-fp:2[5212] general protection ip:7fa363f78cc0 sp:7fa35c7a9500 error:0 in libc-2.23.so[7fa363f2a000+1c0000]
2020-10-28T20:01:47.998444+00:00 edge01.your.domain kernel - - - [255223.841964] grsec: Segmentation fault occurred at            (nil) in /opt/vmware/nsx-edge/sbin/datapathd[dp-fp:2:5212] uid/euid:0/0 
gid/egid:124/124, parent /opt/vmware/edge/dpd/entrypoint.sh[entrypoint.sh:5101] uid/euid:0/0 gid/egid:124/124
2020-10-28T20:01:48.043Z edge01 NSX 26992 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.dp-fp:2.1603915307.5158.0.11.gz


Environment

VMware NSX-T Data Center

Cause

Due to a known issue affecting how the Load Balancer interprets port ranges, when a single port range is specified in the configuration, sometimes it misinterprets the information stored in the FW data structure when it does a lookup, and depending on several factors including the range defined and which features are in use, this can cause a crash of the datapath process.

Resolution

Currently there is no resolution.

Workaround:
A temporary workaround is to configure the Virtual Server port range as multiple ranges covering the original single port range, e.g.:
  • Before: 30200~30300
  • After: 30200~30250,30251~30300