NSX Manager cluster intermittently degraded due to Proton or Compactor running Out Of Memory
search cancel

NSX Manager cluster intermittently degraded due to Proton or Compactor running Out Of Memory

book

Article ID: 377593

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Environment has been upgraded from 3.0/3.1 to 3.2.x/4.x
  • Proton wrapper logs may show out of memory 

/var/log/proton/proton-tomcat-wrapper.log
21122:STATUS | wrapper | [TIMESTAMP] | The JVM has run out of memory.  Requesting thread dump.
21128:STATUS | wrapper | [TIMESTAMP] | The JVM has run out of memory.  Requesting thread dump.
21137:STATUS | wrapper | [TIMESTAMP] | The JVM has run out of memory.  Requesting thread dump.
21143:STATUS | wrapper | [TIMESTAMP] | The JVM has run out of memory.  Requesting thread dump.

  • Compactor logs may show out of memory

var/log/corfu/corfu-compactor-audit.log
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="gzip -f /image/core/compactor_oom.hprof"
#   Executing /bin/sh -c "gzip -f /image/core/compactor_oom.hprof"...
Aborting due to java.lang.OutOfMemoryError: Java heap space
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  INVALID (0xe0000000) at pc=0x0000000000000000, pid=14350, tid=[TID]
#  fatal error: OutOfMemory encountered: Java heap space
#
#
# JRE version: OpenJDK Runtime Environment (Zulu 8.55.0.14-SA-linux64) (8.0_301-b02) (build 1.8.0_301-b02)
# Java VM: OpenJDK 64-Bit Server VM (25.301-b02 mixed mode linux-amd64 compressed oops)
# Core dump written. Default location: //core or core.14350

  • Compactor logs show the ApiTracker table (UUID ########-####-####-####-#######4297a) has a large number of entries 500K >. Example below shows 5 million entries.

var/log/corfu/corfu-compactor-audit.log
[TIMESTAMP] | INFO  |              Cmpt-chkpter-9000 |   o.c.runtime.CheckpointWriter | appendCheckpoint: completed checkpoint for ########-####-####-####-#######4297a, entries(5000000), cpSize([SIZE]) bytes at snapshot Token(epoch=[EPOCH], sequence=[SEQ .No]) in [TIME TO PROCESS] ms

 

Environment

VMware NSX-T Data Center 3.2.x
VMware NSX 4.x

Cause

The upgrade caused invalid data to be added to the EntityDeletionMarker table. As a result of this invalid data, the maintenance job that routinely cleans up ApiTracker fails.

Resolution

Workaround:

If this issue is encountered, please open a case with Broadcom Support and reference this KB.

Additional Information

If you are contacting Broadcom support about this issue, please provide the following:

  • Retrieve log bundles from all NSX Managers involved

  • On all NSX Managers, please SSH with root and run:

    • df -h (please provide a screenshot) 

    • cat /config/corfu/LAYOUT_CURRENT.ds

Handling Log Bundles for offline review with Broadcom support