NSX Cluster shows as down and the NSX UI is not accessible.
search cancel

NSX Cluster shows as down and the NSX UI is not accessible.

book

Article ID: 402282

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX UI becomes inaccessible 
  • When checking the NSX cluster status over SSH admin user, the status shows as DOWN.
  • NSX UI may show Application on NSX node has crashed alarm
  • Compactor logs may show out of memory when trying to compact two corfu database tables with UUID d24c6611-eec5-3ec5-9063-41ebbad3479d and UUID a5af5b7e-0ec5-33d1-8241-c3f3a82e55dc in var/log/corfu/corfu-compactor-audit.log :
    <Timestamp> | INFO  |              Cmpt-chkpter-9000 |     o.c.runtime.view.SMRObject | ObjectBuilder: open Corfu stream monitoring$SummationGenericStatsRecords2 id a5af5b7e-0ec5-33d1-8241-c3f3a82e55dc
    <Timestamp> | INFO  |              Cmpt-chkpter-9000 |  o.c.r.c.PersistedStreamingMap | Cleared RocksDB data on /config/corfu-compactor/compactor_monitoring_SummationGenericStatsRecords2
    <Timestamp> | INFO  |              Cmpt-chkpter-9000 |   o.c.runtime.CheckpointWriter | appendCheckpoint: Started checkpoint for a5af5b7e-0ec5-33d1-8241-c3f3a82e55dc at snapshot Token(epoch=44, sequence=1517977907)
    <Timestamp> | INFO  |              Cmpt-chkpter-9000 |  o.c.r.c.PersistedStreamingMap | Cleared RocksDB data on /config/corfu-compactor/compactor_monitoring_SummationGenericStatsRecords2
    <Timestamp> | INFO  |              Cmpt-chkpter-9000 |  o.c.r.c.PersistedStreamingMap | Cleared RocksDB data on /config/corfu-compactor/compactor_monitoring_SummationGenericStatsRecords2
    <Timestamp> | INFO  |              Cmpt-chkpter-9000 |  o.c.r.c.PersistedStreamingMap | Cleared RocksDB data on /config/corfu-compactor/compactor_monitoring_SummationGenericStatsRecords2
    java.lang.OutOfMemoryError: Java heap space
    Dumping heap to /image/core/compactor_oom.hprof ...
    Heap dump file created [500997995 bytes in 2.304 secs]
    #
    # java.lang.OutOfMemoryError: Java heap space
    # -XX:OnOutOfMemoryError="gzip -f /image/core/compactor_oom.hprof"
    #   Executing /bin/sh -c "gzip -f /image/core/compactor_oom.hprof"...
    Aborting due to java.lang.OutOfMemoryError: Java heap space
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  Internal Error (debug.cpp:308), pid=2041211, tid=0x00006f53fed34700
    #  fatal error: OutOfMemory encountered: Java heap space
    #
    # JRE version: OpenJDK Runtime Environment (8.0_382-b06) (build 1.8.0_382-b06)
    # Java VM: OpenJDK 64-Bit Server VM (25.382-b06 mixed mode linux-amd64 compressed oops)
    # Core dump written. Default location: /usr/tanuki/bin/core or core.2041211
    #
    # An error report file with more information is saved as:
    # /usr/tanuki/bin/hs_err_pid2041211.log
    #
    # If you would like to submit a bug report, please visit:
    #   https://bell-sw.com/support
    #
    Aborted (core dumped)


    <Timestamp> | INFO  |              Cmpt-chkpter-9000 |     o.c.runtime.view.SMRObject | ObjectBuilder: open Corfu stream monitoring$SummationGenericStatsRecords1 id d24c6611-eec5-3ec5-9063-41ebbad3479d
    <Timestamp> | INFO  |              Cmpt-chkpter-9000 |  o.c.r.c.PersistedStreamingMap | Cleared RocksDB data on /config/corfu-compactor/compactor_monitoring_SummationGenericStatsRecords1
    <Timestamp> | INFO  |              Cmpt-chkpter-9000 |   o.c.runtime.CheckpointWriter | appendCheckpoint: Started checkpoint for d24c6611-eec5-3ec5-9063-41ebbad3479d at snapshot Token(epoch=44, sequence=1518059021)
    <Timestamp> | INFO  |              Cmpt-chkpter-9000 |  o.c.r.c.PersistedStreamingMap | Cleared RocksDB data on /config/corfu-compactor/compactor_monitoring_SummationGenericStatsRecords1
    <Timestamp> | INFO  |              Cmpt-chkpter-9000 |  o.c.r.c.PersistedStreamingMap | Cleared RocksDB data on /config/corfu-compactor/compactor_monitoring_SummationGenericStatsRecords1
    <Timestamp> | INFO  |              Cmpt-chkpter-9000 |  o.c.r.c.PersistedStreamingMap | Cleared RocksDB data on /config/corfu-compactor/compactor_monitoring_SummationGenericStatsRecords1
    java.lang.OutOfMemoryError: Java heap space
    Dumping heap to /image/core/compactor_oom.hprof ...
    Heap dump file created [501027081 bytes in 1.584 secs]
    #
    # java.lang.OutOfMemoryError: Java heap space
    # -XX:OnOutOfMemoryError="gzip -f /image/core/compactor_oom.hprof"
    #   Executing /bin/sh -c "gzip -f /image/core/compactor_oom.hprof"...
    Aborting due to java.lang.OutOfMemoryError: Java heap space
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  Internal Error (debug.cpp:308), pid=2086293, tid=0x000076f880c4b700
    #  fatal error: OutOfMemory encountered: Java heap space
    #
    # JRE version: OpenJDK Runtime Environment (8.0_382-b06) (build 1.8.0_382-b06)
    # Java VM: OpenJDK 64-Bit Server VM (25.382-b06 mixed mode linux-amd64 compressed oops)
    # Core dump written. Default location: /usr/tanuki/bin/core or core.2086293
    #
    [thread 130809695786752 also had an error]
    # An error report file with more information is saved as:
    # /usr/tanuki/bin/hs_err_pid2086293.log
    #
    # If you would like to submit a bug report, please visit:
    #   https://bell-sw.com/support
    #
    Aborted (core dumped)

Cause

The two Corfu Database tables "monitoring$SummationGenericStatsRecords1 id d24c6611-eec5-3ec5-9063-41ebbad3479d" and "monitoring$SummationGenericStatsRecords2 id a5af5b7e-0ec5-33d1-8241-c3f3a82e55dc" are known to have large transactions which could cause Corfu Compactor to run out of memory. 

Resolution

This issue is resolved in VMware NSX 4.2.0, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

If you believe you have encountered this issue, please open a support case with Broadcom Support and refer to this KB article.

For more information, see Creating and managing Broadcom support cases.

Additional Information