NSX-T NestDB database and etc/group file corrupted due to out of memory issue
search cancel

NSX-T NestDB database and etc/group file corrupted due to out of memory issue

book

Article ID: 312646

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • You may notice that your NSX-T Edge Node cannot boot up and may enter emergency mode
  • In the NSX-T UI you observe that the NSX-T Edge Node's configuration state is Failed and connectivity to the NSX-T Manager is down.

 

  • If the NSX-T Edge Node was already booted successfully you may notice similar entries in the /var/log/syslog.log file.
20##-##-##T##:##:##.076Z NSX-edge01 NSX 2992 - [nsx@6876 comp="nsx-edge" subcomp="nsx-nestdb" tid="2992" level="ERROR" errorCode="NST0103"] leveldb::DB::Write() failed: Corruption: not an sstable (bad magic number)

20##-##-##T##:##:##.076Z NSX-edge01 NSX 3269 - [nsx@6876 comp="nsx-edge" subcomp="nsx-nestdb" tid="3269" level="INFO"] DbAccess error in NestDbServer::GetNextTxnId(): Corruption: not an sstable (bad magic number). txn_id_ not persisted to db.
  • You may also notice that the NestDb service is unable to start,  leading to communication issue with NestDB.

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

 



Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 4.x
VMware NSX-T Data Center 3.x

Cause

The errors that come from NestDB are due to NestDB corruption.

Resolution

This issue is resolved in NSX-T version 4.1. Build numbers and versions of VMware NSX/NSX-T Data Center

Workaround:
There are two workarounds for this issue:
1. Replace NSX-T Edge Node
This can be done by deploying a new NSX-T Edge Node and replacing the NSX-T Edge Node in the Edge cluster.
2. Remove the content of the DB directory and restart NestDB
service nsx-nestdb stop
/bin/rm -rf /config/vmware/nsx/nestdb/db/*
service nsx-nestdb start