NSX Edge - Application service crash alarm - Segmentation or Signal_ fault
search cancel

NSX Edge - Application service crash alarm - Segmentation or Signal_ fault

book

Article ID: 372027

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Several services(dataplane, local-controller, nds, dispatcher, dhcp, router, nvapi) stop at the same time and triggering failover due to the dataplane service stopping.
  • There is a kernel defect and it causes "bad frame in rt_sigreturn" logs.
  • Here is an example of findings from the /var/log/syslog or the /var/log/kern.log of the affected edge node.

202#-##-##T##:##:##.###Z myhostname.myFQDN kernel - - - [########.######] signal_fault: ### callbacks suppressed

202#-##-##T##:##:##.###Z myhostname.myFQDN kernel - - - [########.######] myservice.py[####] bad frame in rt_sigreturn frame:################ ip:############ sp:############ orax:ffffffffffffffff in mylibraryfile[#################]

202#-##-##T##:##:##.###Z myhostname.myFQDN kernel - - - [########.######] grsec: Segmentation fault occurred at ################ in /opt/vmware/###-api/bin/python/########_api/webserver/myservice.py[myservice.py:####] uid/euid:33/33 gid/egid:33/33, parent /usr/lib/systemd/systemd[systemd:1] uid/euid:0/0 gid/egid:0/0

Environment

VMware NSX-T Data Center 3.2.3.0.1

VMware NSX 4.1.0

Cause

These crashes occur due to the operating system of the NSX Edge failing to aqcuire the memory / data required to accommodate the service continued processing. Once the memory reference is lost in a transient manner / race condition, it is attained on the next subsequent impacted service restart which the operating system performs automatically.

Resolution

This issue is addressed in NSX-T 3.2.4 & NSX-T 4.1.1, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.