NSX-T edge node dataplane service crash due to PMTU expiry
search cancel

NSX-T edge node dataplane service crash due to PMTU expiry

book

Article ID: 318296

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

On NSX-T 3.0.X or 3.1.X Environment.

IPSec VPN tunnels may go down and BGP connection may drop.

You may see log entries like the following in the NSX-Manager /var/log/syslog:

2021-04-01T15:26:21.966Z FATAL pool-119-thread-1 MonitoringServiceImpl 22013 MONITORING [nsx@6876 alarmId="0849b39f-2efa-####-####-########44c" alarmState="OPEN" comp="nsx-manager" entId="0e2e2ee3-xxxx-xxxx-05bc6e31f8a0" errorCode="MP701099" eventFeatureName="infrastructure_service" eventSev="CRITICAL" eventState="On" eventType="edge_service_status_down" level="FATAL" nodeId="80924d88-96ba-####-####-########a97" subcomp="monitoring"] The service ipsecvpn is down for at least one minute.
...
2021-04-01T11:57:21.955Z xxxxxxxx.local NSX 22013 MONITORING [nsx@6876 alarmId="733a9d52-d5ef-####-####-########cc0" alarmState="OPEN" comp="nsx-manager" entId="2bf06240-xxxx-xxxx-xxxxxxx" errorCode="MP701099" eventFeatureName="infrastructure_service" eventSev="CRITICAL" eventState="On" eventType="edge_service_status_down" level="FATAL" nodeId="80924d88-xxxx-xxxx-754a7a39da97 0" subcomp="monitoring"] The service dataplane is down for at least one minute.

Segmentation fault error's may also bee seen in the log kern.log of the NSX-T edge node:

2021-04-01T11:57:21.932385+00:00 xxxxxxxx kernel - - - [14719770.144468] grsec: Segmentation fault occurred at (nil) in /opt/vmware/nsx-edge/sbin/datapathd[dp-fp:0:2849] uid/euid:0/0 gid/egid:124/124, parent /opt/vmware/edge/dpd/entrypoint.sh[entrypoint.sh:2799] uid/euid:0/0 gid/egid:124/124

On the NSX-T edge node the dataplane service has crashed. In the NSX-T edge node /var/log/nvpapi/api_server.log, you may see the following log entries:

2021-04-01 12:55:18,650 1520 napi.root.node.diagnosis_api WARNING Found cores: ['core.dp-fp:1.1617290060.30276.0.11.gz', 'core.dp-fp:1.1617290725.5027.0.11.gz', 'core.dp-fp:0.1615079056.6488.0.8.gz', 'core.dp-fp:1.1616531789.18956.0.11.gz', 'core.dp-fp:1.1616598121.1666.0.11.gz'



Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 3.x

Cause

This issue occurs due to the process used to clear the PMTU HMAP, packets processed during this period, may get an invalid state and result in the data plane service crash.

Resolution

This issue is resolved in NSX-T Data Center 3.2.0

Workaround:
If you are unable to upgrade and require a workaround to this issue then please contact Broadcom Support and note this Article ID (318296) in the problem description.