NSX Manager cluster intermittently degraded due to Proton or Compactor running Out Of Memory

search cancel

NSX Manager cluster intermittently degraded due to Proton or Compactor running Out Of Memory

book

Article ID: 377593

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Environment has been upgraded from 3.0/3.1 to 3.2.x/4.x
Proton wrapper logs may show out of memory

Compactor logs may show out of memory

var/log/corfu/corfu-compactor-audit.log
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="gzip -f /image/core/compactor_oom.hprof"
# Executing /bin/sh -c "gzip -f /image/core/compactor_oom.hprof"...
Aborting due to java.lang.OutOfMemoryError: Java heap space
#
# A fatal error has been detected by the Java Runtime Environment:
#
# INVALID (0xe0000000) at pc=0x0000000000000000, pid=14350, tid=[TID]
# fatal error: OutOfMemory encountered: Java heap space
#
#
# JRE version: OpenJDK Runtime Environment (Zulu 8.55.0.14-SA-linux64) (8.0_301-b02) (build 1.8.0_301-b02)
# Java VM: OpenJDK 64-Bit Server VM (25.301-b02 mixed mode linux-amd64 compressed oops)
# Core dump written. Default location: //core or core.14350

Compactor logs show the ApiTracker table (UUID ########-####-####-####-#######4297a) has a large number of entries 500K >. Example below shows 5 million entries.

var/log/corfu/corfu-compactor-audit.log
[TIMESTAMP] | INFO | Cmpt-chkpter-9000 | o.c.runtime.CheckpointWriter | appendCheckpoint: completed checkpoint for ########-####-####-####-#######4297a, entries(5000000), cpSize([SIZE]) bytes at snapshot Token(epoch=[EPOCH], sequence=[SEQ .No]) in [TIME TO PROCESS] ms

Environment

VMware NSX-T Data Center 3.2.x
VMware NSX 4.x

Cause

The upgrade caused invalid data to be added to the EntityDeletionMarker table. As a result of this invalid data, the maintenance job that routinely cleans up ApiTracker fails.

Resolution

Workaround:

If this issue is encountered, please open a case with Broadcom Support and reference this KB.

Additional Information

If you are contacting Broadcom support about this issue, please provide the following:

Retrieve log bundles from all NSX Managers involved
On all NSX Managers, please SSH with root and run:
- df -h (please provide a screenshot)
- cat /config/corfu/LAYOUT_CURRENT.ds

Handling Log Bundles for offline review with Broadcom support

Feedback

thumb_up Yes

thumb_down No