NSX-T Edge Node crash during HA failover due to segmentation fault
search cancel

NSX-T Edge Node crash during HA failover due to segmentation fault

book

Article ID: 312624

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Traffic passing through the Edge Node is impacted
  • Dataplane service crashes and generates a core dump in /var/log/core/
  • The Edge Node syslog displays error message similar to 
2021-12-xxTxx:xx:xx.153Z edge-name.com NSX 12691 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.dp-fp:11.1638867190.9350.0.11.gz
  • The Edge Node kern.log displays error similar to 
2021-12-xxTxx:xx:xx.xx8577+00:00 edge-name.com kernel - - - [ 775.251941] grsec: Segmentation fault occurred at 0000710a40000000 in /opt/vmware/nsx-edge/sbin/datapathd[dp-fp:11:18386] uid/euid:0/0 gid/egid:124/124, parent /opt/vmware/edge/dpd/entrypoint.sh[entrypoint.sh:18226] uid/euid:0/0 gid/egid:124/124
 
  • The backtrace is similar to:
#0 pfsync_get_next_state_upd (hv=4 '\004', np=<synthetic pointer>, up=0x7105ee46ef80) at datapath/pf/pf/if_pfsync.c:1715
#1 pfsync_input (kif=kif@entry=0x7123426e47c0, m=<optimized out>, m@entry=0x71233ff568c0, off=off@entry=0) at datapath/pf/pf/if_pfsync.c:2694
#2 0x000013e0776971f0 in dpdk_pfsync_input (cookie=cookie@entry=0x71273f42f200, pkt=<optimized out>, pkt@entry=0x71233ff56a30) at datapath/pf/pf_glue/glue.c:2942
#3 0x000013e0774ef0cc in firewall_sync_input (m=<optimized out>, cookie=0x71273f42f200) at datapath/firewall_sync.c:1719
#4 firewall_sync_lrouter_input (m=m@entry=0x7105ee46ec80) at datapath/firewall_sync.c:1815
#5 0x000013e0774b3f4f in tunnel_mgmt_input (m=m@entry=0x7105ee46ec80) at datapath/tunnel_mgmt.c:45



Environment

VMware NSX-T Data Center

Cause

The issue happens due to corruption in the TLV packet for the HA data of the Edge Firewall/NAT. When the active SR components sync the firewall/NAT state with the standby component, if the length field of the HA packet is invalid it can cause the DP to crash.

Resolution

This is a known issue impacting VMware NSX.

If you believe you have encountered this issue, please open a support case with Broadcom Support and refer to this KB article.

For more information, see Creating and managing Broadcom support cases.