get cluster status When you encounter this issue from the admin CLI./var/log/cbm/tanuki.log you should see the following log lines for the jvm in cbm in charge of compaction running out of memory.tanuki.log.10:11457:STATUS | wrapper | 2025/04/19 17:30:23 | The JVM has run out of memory. Requesting thread dump.tanuki.log.10:11459:STATUS | wrapper | 2025/04/19 17:30:23 | The JVM has run out of memory. Restart JVM (Ignoring, already restarting).tanuki.log.10:13289:STATUS | wrapper | 2025/04/19 17:30:49 | The JVM has run out of memory. Requesting thread dump.tanuki.log.10:13291:STATUS | wrapper | 2025/04/19 17:30:49 | The JVM has run out of memory. Restarting JVM.tanuki.log.10:14986:STATUS | wrapper | 2025/04/19 17:30:53 | The JVM has run out of memory. Requesting thread dump.tanuki.log.10:14988:STATUS | wrapper | 2025/04/19 17:30:53 | The JVM has run out of memory. Restart JVM (Ignoring, already restarting).tanuki.log.10:16753:STATUS | wrapper | 2025/04/19 17:30:53 | The JVM has run out of memory. Requesting thread dump.tanuki.log.10:16755:STATUS | wrapper | 2025/04/19 17:30:53 | The JVM has run out of memory. Restart JVM (Ignoring, already restarting).tanuki.log.10:18543:STATUS | wrapper | 2025/04/19 17:31:25 | The JVM has run out of memory. Requesting thread dump.tanuki.log.10:18545:STATUS | wrapper | 2025/04/19 17:31:25 | The JVM has run out of memory. Restarting JVM.
/var/log/syslog you can see this as well with the log line below.<Time Stamp> NSX 19643 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP100" level="ERROR" subcomp="cbm"] Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: Java heap spaceUnder /var/log/cbm/cbm.log, you will see reports of services being down, even if they report as being up, when you run 'get cluster status' as admin.
<Time Stamp> ERROR HeartbeatServiceServiceMonitorStatusUpdaterThread ServiceMonitor 92085 - [nsx@6876 comp="nsx-manager" errorCode="HBS153" level="ERROR" s2comp="service-monitor" subcomp="cbm"] One or more services are down: [Epoch:2]CLUSTER_MANAGER:UNKNOWN,SM:DOWN,MONITORING:DOWN,AR:DOWN,MESSAGING_MANAGER:DOWN,PROTON:DOWN,CONTROLLER:DOWN,IDPS_REPORTING:DOWN,SEARCH:DOWN,CM_INV:DOWN,HTTP:DOWN/image/core You might also see CBM core dumps depending on how long the service has been crashing.corfu-compactor-audit.log & corfu-compactor-leader.log to see if compaction is still running. If compaction is not running gracefully, reboot all three NSX managers, and compaction will restart once the managers are back up.VMware NSX
If the environment experiences prolonged spikes in network or storage latency or spikes in cpu usage on the host during the compaction process, this process might take a longer period of time to complete, causing cbm service to run out of memory. Restarting the managers, will clear memory, and kick off a new compaction request.
Wait until the compaction process completes after the reboot.
You can observe this process by tailing in either /var/log/corfu/corfu-compactor-leader.log or /var/log/corfu/corfu-compactor-audit.log.
This is a scenario where it can take a while, depending on the size of the environment./var/log/corfu/corfu-compactor-leader.log completion
2025-04-18T17:29:29.771Z | INFO | Cmpt-9000-chkpter | compactor-leader | DynamicTriggerPolicy: Trigger as elapsedTime 902 > safeTrimPeriod 9002025-04-18T17:29:29.898Z | INFO | Cmpt-9000-chkpter | compactor-leader | Trim completed, elapsed(0s), log address up to 2989733883.2025-04-18T17:29:29.898Z | INFO | Cmpt-9000-chkpter | compactor-leader | =============Initiating Distributed Compaction============2025-04-18T17:29:29.978Z | INFO | Cmpt-9000-chkpter | compactor-leader | Init compaction cycle is successful. Min token 29897783362025-04-19T17:49:04.782Z | INFO | CorfuServer-shutdown-4 | compactor-leader | Compactor Orchestrator service shutting down.2025-04-19T17:52:11.981Z | INFO | initializationTaskThread | compactor-leader | Starting Compaction service...2025-04-19T17:52:22.203Z | INFO | Cmpt-9000-chkpter | compactor-leader | getNewCorfuRuntime: Corfu Runtime connected successfully2025-04-19T17:53:10.733Z | INFO | Cmpt-9000-chkpter | compactor-leader | invokeCheckpointing: hostName: (NSX Manager IPs), port: 90002025-04-19T17:53:10.757Z | INFO | Cmpt-9000-chkpter | compactor-leader | Triggered compactor jvm2025-04-19T18:00:17.268Z | INFO | Cmpt-9000-chkpter | compactor-leader | Shutting down existing checkpointer jvm2025-04-19T18:29:15.257Z | ERROR | Thread-6 | compactor-leader | Exception occurred while getting ErrorStream:
/var/log/corfu/corfu-compactor-audit.log completion
2025-04-19T19:14:20.148Z | INFO | Cmpt-chkpter-9000 | org.corfudb.util.FileWatcher | Closed FileWatcher.2025-04-19T19:14:20.148Z | INFO | FileWatcher-0 | org.corfudb.util.FileWatcher | FileWatcher failed to poll file /config/cluster-manager/corfu/private/keystore.jks, Exception: java.nio.file.ClosedWatchServiceException., isStopped: true2025-04-19T19:14:20.148Z | INFO | FileWatcher-0 | org.corfudb.util.FileWatcher | Watch service is stopped. Skip reloading new watch service.2025-04-19T19:14:20.150Z | WARN | netty-0 | o.c.r.c.NettyClientRouter | userEventTriggered: unhandled event SslCloseCompletionEvent(java.nio.channels.ClosedChannelException)2025-04-19T19:14:20.151Z | WARN | netty-2 | o.c.r.c.NettyClientRouter | userEventTriggered: unhandled event SslCloseCompletionEvent(java.nio.channels.ClosedChannelException)2025-04-19T19:14:20.151Z | WARN | netty-1 | o.c.r.c.NettyClientRouter | userEventTriggered: unhandled event SslCloseCompletionEvent(java.nio.channels.ClosedChannelException)2025-04-19T19:14:20.160Z | INFO | Cmpt-chkpter-9000 | o.c.c.CompactorCheckpointer | Exiting CorfuStoreCompactor2025-04-19T19:14:20.274Z INFO Runner - Finished running corfu compactor tool.
If you believe you have encountered this issue, open a support case with Broadcom Support and refer to this KB article.
For more information, see Creating and managing Broadcom support cases.