BFD tunnels are down on ESXi hosts and VM networking is impacted
search cancel

BFD tunnels are down on ESXi hosts and VM networking is impacted

book

Article ID: 398998

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • ESXi hosts have some or all BFD tunnels down
  • VMs running on impacted hosts have network connectivity issues
  • On the ESXi host /var/run/log/nsx-syslog.log indicates no space left on device for nestdb
    <DATE>T11:20:58.187Z nestdb-server[106666733]: NSX 106666733 - [nsx@6876 comp="nsx-esx" subcomp="nsx-nestdb" tid="106666733" level="ERROR" errorCode="NST0103"] leveldb::DB::Write() failed: IO error: /var/lib/vmware/nsx/nestdb/db/7185661.ldb: No space left on device
    <DATE>T11:20:58.187Z nestdb-server[106666733]: NSX 106666733 - [nsx@6876 comp="nsx-esx" subcomp="nsx-nestdb" s2comp="nsx-rpc" tid="106666733" level="ERROR" errorCode="RPC101"] Exception occurred in service implementation for vmware.nsx.nestdb.NestDb/ConfigureWorkflowTracer: leveldb::DB::Write() failed: IO error: /var/lib/vmware/nsx/nestdb/db/7185661.ldb: No space left on device
  • Nestdb logs are consuming over 300MBs of space
    ls /var/lib/vmware/nsx/nestdb/db/
    -rw-rw-r--  331M May 26 11:12 LOG
    -rw-rw-r--    40M Mar 24 15:54 LOG.old
  • nestdb ramdisk is full or close to full i.e. 0% free or close to 0%
    esxcli system visorfs ramdisk list | egrep "Ramdisk|nestdb"
    Ramdisk Name    System  Include in Coredumps  Reserved   Maximum      Used        Peak Used   Free   Reserved Free  Maximum Inodes  Allocated Inodes  Used Inodes  Mount Point
    nestdb          false   false                 32768 KiB   524288 KiB  519792 KiB  524288 KiB    0 %            0 %            8192                32           23  /var/lib/vmware/nsx/nestdb/db

Environment

VMware NSX 4.x

Cause

This issue occurs in environments when nestdb agent on the host stops functioning due to log files consuming all available nestdb ramdisk.

Resolution

This issue is resolved in VMware NSX 4.2.1.4 and 4.2.2.1, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

 

For hosts already impacted, the logs can be cleared and services restarted to resolve the issue:

/etc/init.d/nsx-nestdb stop; rm -f /var/lib/vmware/nsx/nestdb/db/LOG.old; rm -f /var/lib/vmware/nsx/nestdb/db/LOG; /etc/init.d/nsx-nestdb start; sleep 10; /etc/init.d/nsx-cfgagent restart; /etc/init.d/nsx-opsagent restart


For hosts not yet impacted, a proactive reduction of log size can be performed:

/etc/init.d/nsx-nestdb stop; rm -f /var/lib/vmware/nsx/nestdb/db/LOG.old; rm -f /var/lib/vmware/nsx/nestdb/db/LOG; /etc/init.d/nsx-nestdb start


Note restarting nsx services will not impact the dataplane of VMs running on the host.

Additional Information

For additional information, please refer to: