nsx-nestdb service crashing due to ramdisk full
search cancel

nsx-nestdb service crashing due to ramdisk full

book

Article ID: 371820

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Control channel to transport node down alarms present in NSX UI
  • While checking the service status in above reported host transport nodes, you see nsx-nestdb service is not running
  • nsx-nestdb servie keeps crashing if try to start manually

    The below log messages can be observed in esxi host /var/run/log/nsx-syslog 

    2024-06-25T10:20:02.636Z nestdb-server[2102033]: NSX 2102033 - [nsx@6876 comp="nsx-esx" subcomp="nsx-nestdb" tid="2102033" level="ERROR" errorCode="NST0103"] leveldb::DB::Write() failed: IO error: /var/lib/vmware/nsx/nestdb/db/2245366.ldb: No space left on device

    2024-06-25T10:20:02.636Z nestdb-server[2102033]: NSX 2102033 - [nsx@6876 comp="nsx-esx" subcomp="nsx-nestdb" tid="2102033" level="ERROR" errorCode="NST0103"] Encountered IO error, exiting.

  • vobd.log will have ramdisk full logs similar to the below

    2024-06-25T10:20:02.609Z: [VisorfsCorrelator] 22651233508780us: [vob.visorfs.ramdisk.full] Cannot extend visorfs file /var/lib/vmware/nsx/nestdb/db/2245366.ldb because its ramdisk (nestdb) is full.

    2024-06-25T10:20:02.609Z: [VisorfsCorrelator] 22651751993159us: [esx.problem.visorfs.ramdisk.full] The ramdisk 'nestdb' is full. As a result, the file /var/lib/vmware/nsx/nestdb/db/2245366.ldb could not be written.


  • nestdb core dump is generated

    2024-06-25T10:20:08.387Z: [UserWorldCorrelator] 22651239286956us: [vob.uw.core.dumped] /opt/vmware/nsx-nestdb/bin/nestdb-server(115897219) /var/core/nestdb-server-zdump.000

  • ramdisk usage for var or nestdb appear as full

    vdf -h | tail -15
    -----
    Ramdisk Size Used Available Use% Mounted on
    root 32M 12M 19M 38% --
    etc 28M 1M 26M 4% --
    opt 32M 2M 29M 7% --
    var 48M 48M 0B 100% -- <<<<
    tmp 256M 4K 255M 0% --
    iofilters 32M 0B 32M 0% --
    shm 1024M 0B 1024M 0% --
    crx 1024M 0B 1024M 0% --
    configstore 32M 388K 31M 1% --
    configstorebkp 32M 388K 31M 1% --
    hostdstats 2053M 78M 1974M 3% --
    nestdb 512M 135M 376M 26% --
    nsx-idps 64M 7M 56M 11% --


  • core dump back trace will show crash happens due to OOM during the recovery phase

    Use the "info sharedlibrary" command to see the complete listing.
    Do you need "set solib-search-path" or "set sysroot"?
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    Core was generated by `no arguments'.
    Program terminated with signal SIGABRT, Aborted.
    #0 0x0000004b6df84e15 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
    56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
    (gdb) bt
    #0 0x0000004b6df84e15 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
    #1 0x0000004b6df8628b in __GI_abort () at abort.c:90
    #2 0x0000004b651d42fd in ?? ()
    #3 0x0000000000000028 in ?? ()
    #4 0x0000000000000028 in ?? ()
    #5 0x0000000000000028 in ?? ()
    #6 0x000003495a53c2c0 in ?? ()
    #7 0x0000004b38b23900 in ?? ()
    #8 0x0000004b651d22d6 in ?? ()
    #9 0x00000000000002aa in ?? ()
    #10 0x0000004b651d2321 in ?? ()
    #11 0x0000004b6d18a800 in ?? ()
    #12 0x0000004b24c2f72d in nestdb::oom_handler () at controller/lcp/nestdb/src/cpp/server/nestdb-main.cpp:301
    #13 0x0000004b651d2a1c in ?? ()
    #14 0x0000004b38b239c0 in ?? ()
    #15 0x0000004b24cc4c36 in leveldb::VersionSet::Recover(bool*) ()
    #16 0x0000004b24cb1cb7 in leveldb::DBImpl::Recover(leveldb::VersionEdit*, bool*) ()
    #17 0x0000004b24cb24ea in leveldb::DB::Open(leveldb::Options const&, std::string const&, leveldb::DB**) ()
    #18 0x0000004b24c99a14 in nestdb::DbAccess::Open (this=0x3495a53d010, db_name="/var/lib/vmware/nsx/nestdb/db") at controller/lcp/nestdb/src/cpp/db/db-access.cpp:36
    #19 0x0000004b24c362df in nestdb::Run (argc=<optimized out>, argv=<optimized out>) at controller/lcp/nestdb/src/cpp/server/nestdb-main.cpp:333
    #20 0x0000004b24c2d9ba in main (argc=17, argv=0x3495a53dc20) at controller/lcp/nestdb/src/cpp/server/nestdb-main.cpp:392 

 

Environment

VMware NSX-T Data Center

Cause

NestDB is not able to go beyond 512MB means that the system is under significant memory pressure.

Resolution

If you believe you have encountered this issue, please open a support case with Broadcom Support and refer to this KB article.

For more information, see Creating and managing Broadcom support cases.

Additional Information