NSX Edge Node Application Crash and Edge Failover Due to Segmentation Fault
search cancel

NSX Edge Node Application Crash and Edge Failover Due to Segmentation Fault

book

Article ID: 377017

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Alarm triggered in NSX UI

"Application on NSX node <edge-node-name> has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team."

 

  • Dataplane service crashes and generates a core dump in /var/log/core/
  • The dataplane service crash causes edge failover and temporary traffic disruption  
  • The Edge Node displays a warning: 

/var/log/syslog

[TIMESTAMP] [EDGE NODE NAME] NSX 4137004 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.dp-fp.gz

 

  • The Edge Node displays log entry:

/var/log/kern.log

[TIMESTAMP] [EDGE NODE NAME] kernel - - - [2531844.823231] grsec: Segmentation fault occurred at 0000000000008050 in /opt/vmware/nsx-edge/sbin/datapathd[dp-fp:3:7387] uid/euid:0/0 gid/egid:124/124, parent /opt/vmware/edge/dpd/entrypoint.sh[entrypoint.sh:7177] uid/euid:0/0 gid/egid:124/124

 

  • The backtrace is similar to:


#0  0x000018905d91d1fb in lrouter_do_send_ip (m=0x6b4b67d4adc0, ifuid=0, vrfid=vrfid@entry=1, hlen=hlen@entry=20, to_ingress=to_ingress@entry=false,
    nh_l2=nh_l2@entry=0x0) at datapath/lrouter.c:1284
#1  0x000018905d91f8c4 in lrouter_send_ip (m=<optimized out>, ifuid=<optimized out>, vrfid=vrfid@entry=1, hlen=hlen@entry=20,
    to_ingress=to_ingress@entry=false) at datapath/lrouter.c:1337
#2  0x000018905d92a77b in lrouter_dst_unreachable (m=m@entry=0x6b4b707e3f40, vrfid=vrfid@entry=1, code=code@entry=3 '\003', mtu=mtu@entry=0,
    v4=v4@entry=true, src_ip_rewrite=src_ip_rewrite@entry=false) at datapath/lrouter.c:1733
#3  0x000018905d89c980 in ip_local_deliver_demux (out_ifp=0x6b497acdb140, m=<optimized out>) at datapath/ip.c:1031
#4  ip_local_deliver (m=<optimized out>, ifp=0x6b497acdb140) at datapath/ip.c:1073
#5  0x000018905d8a463f in iface_output (is_repl=false, if_type=<optimized out>, if_port=<optimized out>, ifp=0x6b497acdb140, m=0x6b4b707e3f40, rl=0x0,
    v=0x18905ef7ca40 <VLM_ip>) at datapath/iface-impl.h:1068
#6  ip_routing_by_egr_ifuid (m=0x6b4b707e3f40, vrfid=<optimized out>, from_flow_cache=<optimized out>, mpls_label=0, fwd_nh=<optimized out>,
    nh=<optimized out>, egrs=0x0, extra_dec_ttl=0 '\000', is_igmp=false) at datapath/ip.c:2443
#7  0x000018905d8a52c5 in ip_routing (m=0x6b4b707e3f40, vrfid=vrfid@entry=1, fwd_nh=fwd_nh@entry=0, from_flow_cache=from_flow_cache@entry=false,
    nh_selected=nh_selected@entry=-1) at datapath/ip.c:2237
#8  0x000018905d8a6329 in ip_input_fast (m=<optimized out>, m@entry=0x6b4b707e3f40, vrfid=vrfid@entry=1, orig_core=<optimized out>,
    fw_service=fw_service@entry=2 '\002') at datapath/ip.c:1759
#9  0x000018905d8a87bc in ip_input_fast_post_service_cb (m=0x6b4b707e3f40, data=0x6b4b707e4188) at datapath/ip.c:1903
#10 0x000018905d89a4e0 in m_call_process_fct (m=<optimized out>) at datapath/mbuf.h:1539
#11 intercore_drain (lcore_rank=lcore_rank@entry=3) at datapath/intercore.c:62
#12 0x000018905d9655f3 in fpn_main_loop (unused=<optimized out>) at datapath/main-loop.c:749
#13 0x000018905d9015ea in fpn_job_poll (my_cpu_id=<optimized out>) at datapath/job.c:120
#14 0x000018905d8942db in fpn_intel_rte_job_poll (arg=<optimized out>) at datapath/intel-rte.c:2391

Environment

VMware NSX 4.x

Cause

This issue can occur when packets not destined for the firewall are incorrectly routed to the firewall for processing.  This can cause the edge datapath to terminate and restart, resulting in traffic drops on the edge.

 

 

Resolution

Fixed in VMware NSX 4.2