Cluster group status in manager becomes unstable due to periodical cbm crash

search cancel

Cluster group status in manager becomes unstable due to periodical cbm crash

book

Article ID: 330574

calendar_today

Updated On: 08-03-2022

Products

VMware NSX

Issue/Introduction

Symptoms:

Many cluster group status become DEGRADED randomly.
You can see many following entries in tanuki.log, which show cbm faced OOM periodically.

INFO   | jvm 13   | 2020/09/03 03:15:40 | java.lang.OutOfMemoryError: Java heap space
STATUS | wrapper  | 2020/09/03 03:15:40 | The JVM has run out of memory.  Requesting thread dump.
STATUS | wrapper  | 2020/09/03 03:15:40 | Dumping JVM state.
STATUS | wrapper  | 2020/09/03 03:15:40 | The JVM has run out of memory.  Restarting JVM.
INFO   | jvm 13   | 2020/09/03 03:15:40 | Dumping heap to /image/core/cbm_oom.hprof ...

You can see following entry in corfu-compactor-audit.log which shows corfu compactor faced OOM.

2020-09-01T05:15:58.364Z ERROR main FrameworkCorfuCompactor - - [nsx@6876 comp="nsx-manager" errorCode="MP1" level="ERROR" subcomp="corfu-compactor"] Checkpoint failed for framework data with namespace nsx-manager
java.lang.OutOfMemoryError: Java heap space

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX-T Data Center

Cause

corfu compactor becomes stuck after facing OOM due to some reason. This is 2.5.1 and earlier versions bug. If compactor process is stuck, cbm faces OOM because cbm calls corfu compactor process periodically. Thus cbm faces OOM periodically.

Resolution

This problem is fixed in NSX-T 2.5.2 and later versions

Workaround:
Reboot all managers

Feedback

Was this article helpful?

thumb_up Yes

thumb_down No