Multiple services are restarted due to java.lang.OutOfMemoryError on multiple Manager nodes within a few minutes
search cancel

Multiple services are restarted due to java.lang.OutOfMemoryError on multiple Manager nodes within a few minutes

book

Article ID: 367019

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Multiple services, such as proton, ccp, policy, and phc, experience java.lang.OutOfMemoryError and restarted on multiple Manager nodes within a few minutes.
  • You see  java.lang.OutOfMemoryError in wrapper logs of multiple services on multiple Manager nodes.

/var/log/cloudnet/nsx-ccp-wrapper.errput
INFO   | jvm 1    | 2024/04/25 08:50:17 | java.lang.OutOfMemoryError: Java heap space
STATUS | wrapper  | 2024/04/25 08:50:17 | The JVM has run out of memory.  Requesting thread dump.
STATUS | wrapper  | 2024/04/25 08:50:17 | Dumping JVM state.
INFO   | jvm 1    | 2024/04/25 08:50:17 | Dumping heap to /image/core/ccp_oom.hprof ...

/var/log/proton/proton-tomcat-wrapper.log
INFO   | jvm 1    | 2024/04/25 08:50:49 | java.lang.OutOfMemoryError: Java heap space
STATUS | wrapper  | 2024/04/25 08:50:49 | The JVM has run out of memory.  Requesting thread dump.
STATUS | wrapper  | 2024/04/25 08:50:49 | Dumping JVM state.
INFO   | jvm 1    | 2024/04/25 08:50:49 | Dumping heap to /image/core/proton_oom.hprof ...

/var/log/phonehome-coordinator/phonehome-coordinator-tomcat-wrapper.log
INFO   | jvm 1    | 2024/04/25 08:45:35 | java.lang.OutOfMemoryError: Java heap space
STATUS | wrapper  | 2024/04/25 08:45:35 | The JVM has run out of memory.  Requesting thread dump.
STATUS | wrapper  | 2024/04/25 08:45:35 | Dumping JVM state.
STATUS | wrapper  | 2024/04/25 08:45:35 | The JVM has run out of memory.  Restarting JVM.
INFO   | jvm 1    | 2024/04/25 08:45:35 | Dumping heap to /image/core/phc_oom.hprof ...

  • You may see some services are temporarily DEGRADED status on GUI.
  • You may see some .hprof files created in /image/core .
  • corfu sequence number crossed 2147483647 just before the issue.

    2024-04-25T08:45:18.731Z | DEBUG | LogUnit-16 | o.c.i.LogUnitServer | log write: type: DATA, address: Token(epoch=57, sequence=2147483647), streams: {<UUID>=2147483644, <UUID>}
  • After the services are restarted, no more OOM is seen.

Environment

VMware NSX-T Data Center 3.0
VMware NSX-T Data Center 3.1

Cause

The root cause is the same as KB 317760.

Sometimes it results in simultaneous OOM and recovers without any manual intervention.

Resolution

This is resolved in NSX-T version 3.0.3.1 and 3.1.2.1 .

NSX-T 3.0.3.2 lacks the fix.

Workaround:
To work around this issue, contact Broadcom Support and note this Article ID (367019) in the problem description.