NSX-T Edge Node crash during HA failover due to segmentation fault
search cancel

NSX-T Edge Node crash during HA failover due to segmentation fault

book

Article ID: 312624

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:

  • Traffic passing through the Edge Node is impacted
  • Dataplane service crashes and generates a core dump in /var/log/core/
  • The Edge Node syslog displays error message similar to 
2021-12-xxTxx:xx:xx.153Z edge-name.com NSX 12691 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.dp-fp:11.1638867190.9350.0.11.gz
  • The Edge Node kern.log displays error similar to 
2021-12-xxTxx:xx:xx.xx8577+00:00 edge-name.com kernel - - - [ 775.251941] grsec: Segmentation fault occurred at 0000710a40000000 in /opt/vmware/nsx-edge/sbin/datapathd[dp-fp:11:18386] uid/euid:0/0 gid/egid:124/124, parent /opt/vmware/edge/dpd/entrypoint.sh[entrypoint.sh:18226] uid/euid:0/0 gid/egid:124/124
 
  • The backtrace is similar to:

#0 pfsync_get_next_state_upd (hv=4 '\004', np=<synthetic pointer>, up=0x7105ee46ef80) at datapath/pf/pf/if_pfsync.c:1715
#1 pfsync_input (kif=kif@entry=0x7123426e47c0, m=<optimized out>, m@entry=0x71233ff568c0, off=off@entry=0) at datapath/pf/pf/if_pfsync.c:2694
#2 0x000013e0776971f0 in dpdk_pfsync_input (cookie=cookie@entry=0x71273f42f200, pkt=<optimized out>, pkt@entry=0x71233ff56a30) at datapath/pf/pf_glue/glue.c:2942
#3 0x000013e0774ef0cc in firewall_sync_input (m=<optimized out>, cookie=0x71273f42f200) at datapath/firewall_sync.c:1719
#4 firewall_sync_lrouter_input (m=m@entry=0x7105ee46ec80) at datapath/firewall_sync.c:1815
#5 0x000013e0774b3f4f in tunnel_mgmt_input (m=m@entry=0x7105ee46ec80) at datapath/tunnel_mgmt.c:45



Environment

VMware NSX-T Data Center 3.x
VMware NSX-T Data Center

Cause

The issue happens due to corruption in the TLV packet for the HA data of the Edge Firewall/NAT. When the active SR components sync the firewall/NAT state with the standby component, if the length field of the HA packet is invalid it can cause the DP to crash.

Resolution

Currently there is no resolution to this issue.

Workaround:
Should you encounter this issue please raise a Support Request with VMware referencing this article.