After an NSX Edge reboot, the nestdb service crashes
search cancel

After an NSX Edge reboot, the nestdb service crashes

book

Article ID: 390429

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • "Application on NSX node has crashed" alarm seen on versions prior to 4.2.1, no alarm observed on higher versions.
  • NSX Edge service nestdb crashes creating a core file, as root user on the Edge.
    • Login as admin and switch to the root account by running the command "st en".
      • Run the following command to  list out the crash dumps.
      • root@<Edge node name>:~# ls /var/log/core/
        core.nestdb-server.<>.gz
  • On NSX Edge /var/log/syslog shows a segmentation fault and the core creation

<DATE>T21:47:12.753Z Edge kernel - - - [   88.901674] grsec: Segmentation fault occurred at 0000000000000000 in /opt/vmware/nsx-nestdb/bin/nestdb-server[nestdb-server:3154] uid/euid:994/994 gid/egid:1005/1005, parent /opt/vmware/nsx-nestdb/bin/watchdog.sh[watchdog.sh:3120] uid/euid:994/994 gid/egid:1005/1005

<DATE>T21:47:12.601Z Edge NSX 4902 - [nsx@6876 comp="nsx-edge" subcomp="node-mgmt" username="root" level="WARNING"] Core file generated: /var/log/core/core.nestdb-server.<>.gz

  • Logs indicate the Edge booted just before the service crash, /var/log/kern.log

<DATE>T21:45:47.442Z Edge kernel - - - [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.15.123-nn2-server root=UUID=<UUID> ro audit=1 quiet splash nomodeset nopku rootdelay=90 net.ifnames=1 biosdevname=0 transparent_hugepage=never nosmt kptr_restrict=2 cgroup.memory=nokmem intel_iommu=off numa_gpnodes=-1 hugepagesz=1G hugepages=4 isolcpus=0,1 crashkernel=512-32G:256M,32G-:512M

Environment

VMware NSX 4.x

Cause

This nestdb service crash occurs during Edge bootup due to failure to process messages correctly. The service immediately restarts and there is no functional impact.

Resolution

This issue is resolved in VMware NSX 4.2.0 and above available at Broadcom downloads.

If an application crash alarm is present, the core file must be deleted, follow Application on NSX node has crashed alarm KB to resolve.

Additional Information