NSX-T GUI is inaccessible but all services are running. The web GUI may display the error below as well. You might also not be able to run a get cluster status when you encounter this issue from admin cli.
Under /var/log/cbm/tanuki.log you should see the following log lines for the jvm in cbm in charge of compaction running out of memory.
tanuki.log.10:11457:STATUS | wrapper | 2025/04/19 17:30:23 | The JVM has run out of memory. Requesting thread dump.tanuki.log.10:11459:STATUS | wrapper | 2025/04/19 17:30:23 | The JVM has run out of memory. Restart JVM (Ignoring, already restarting).tanuki.log.10:13289:STATUS | wrapper | 2025/04/19 17:30:49 | The JVM has run out of memory. Requesting thread dump.tanuki.log.10:13291:STATUS | wrapper | 2025/04/19 17:30:49 | The JVM has run out of memory. Restarting JVM.tanuki.log.10:14986:STATUS | wrapper | 2025/04/19 17:30:53 | The JVM has run out of memory. Requesting thread dump.tanuki.log.10:14988:STATUS | wrapper | 2025/04/19 17:30:53 | The JVM has run out of memory. Restart JVM (Ignoring, already restarting).tanuki.log.10:16753:STATUS | wrapper | 2025/04/19 17:30:53 | The JVM has run out of memory. Requesting thread dump.tanuki.log.10:16755:STATUS | wrapper | 2025/04/19 17:30:53 | The JVM has run out of memory. Restart JVM (Ignoring, already restarting).tanuki.log.10:18543:STATUS | wrapper | 2025/04/19 17:31:25 | The JVM has run out of memory. Requesting thread dump.tanuki.log.10:18545:STATUS | wrapper | 2025/04/19 17:31:25 | The JVM has run out of memory. Restarting JVM.
In /var/log/syslog you can see this as well with the log line below.
2025-04-19T18:01:06.775Z NSX 19643 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP100" level="ERROR" subcomp="cbm"] Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: Java heap space
Under /var/log/cbm/cbm.log you will see reports of services being down even if they report as being up when you run 'get cluster status' as admin.
cbm.log:19933:2025-04-19T18:52:26.675Z ERROR HeartbeatServiceServiceMonitorStatusUpdaterThread ServiceMonitor 92085 - [nsx@6876 comp="nsx-manager" errorCode="HBS153" level="ERROR" s2comp="service-monitor" subcomp="cbm"] One or more services are down: [Epoch: 2]CLUSTER_MANAGER:UNKNOWN,SM:DOWN,MONITORING:DOWN,AR:DOWN,MESSAGING_MANAGER:DOWN,PROTON:DOWN,CONTROLLER:DOWN,IDPS_REPORTING:DOWN,SEARCH:DOWN,CM_INV:DOWN,HTTP:DOWN
In /image/core you might also see cbm core dumps depending on how long the service has been crashing.
Validate, in corfu-compactor-audit.log & corfu-compactor-leader.log to see if compaction is still running. If compaction is not running gracefully reboot all three NSX managers, and compaction will restart once the managers are back up.
If the environment experiences prolonged spikes in network or storage latency or spikes in cpu usage on the host during the compaction process, this process might take a longer period of time to complete, causing cbm service to run out of memory. Restarting the managers, will clear memory, and kick off a new compaction request.
Wait until the compaction process completes after the reboot. You can observe this process by tailing in either /var/log/corfu/corfu-compactor-leader.log or /var/log/corfu/corfu-compactor-audit.log. This is a scenario where it can take a while depending on the size of the environment.
/var/log/corfu/corfu-compactor-leader.log completion
2025-04-18T17:29:29.771Z | INFO | Cmpt-9000-chkpter | compactor-leader | DynamicTriggerPolicy: Trigger as elapsedTime 902 > safeTrimPeriod 9002025-04-18T17:29:29.898Z | INFO | Cmpt-9000-chkpter | compactor-leader | Trim completed, elapsed(0s), log address up to 2989733883.2025-04-18T17:29:29.898Z | INFO | Cmpt-9000-chkpter | compactor-leader | =============Initiating Distributed Compaction============2025-04-18T17:29:29.978Z | INFO | Cmpt-9000-chkpter | compactor-leader | Init compaction cycle is successful. Min token 29897783362025-04-19T17:49:04.782Z | INFO | CorfuServer-shutdown-4 | compactor-leader | Compactor Orchestrator service shutting down.2025-04-19T17:52:11.981Z | INFO | initializationTaskThread | compactor-leader | Starting Compaction service...2025-04-19T17:52:22.203Z | INFO | Cmpt-9000-chkpter | compactor-leader | getNewCorfuRuntime: Corfu Runtime connected successfully2025-04-19T17:53:10.733Z | INFO | Cmpt-9000-chkpter | compactor-leader | invokeCheckpointing: hostName: (NSX Manager IPs), port: 90002025-04-19T17:53:10.757Z | INFO | Cmpt-9000-chkpter | compactor-leader | Triggered compactor jvm2025-04-19T18:00:17.268Z | INFO | Cmpt-9000-chkpter | compactor-leader | Shutting down existing checkpointer jvm2025-04-19T18:29:15.257Z | ERROR | Thread-6 | compactor-leader | Exception occurred while getting ErrorStream:
/var/log/corfu/corfu-compactor-audit.log completion
2025-04-19T19:14:20.148Z | INFO | Cmpt-chkpter-9000 | org.corfudb.util.FileWatcher | Closed FileWatcher.2025-04-19T19:14:20.148Z | INFO | FileWatcher-0 | org.corfudb.util.FileWatcher | FileWatcher failed to poll file /config/cluster-manager/corfu/private/keystore.jks, Exception: java.nio.file.ClosedWatchServiceException., isStopped: true2025-04-19T19:14:20.148Z | INFO | FileWatcher-0 | org.corfudb.util.FileWatcher | Watch service is stopped. Skip reloading new watch service.2025-04-19T19:14:20.150Z | WARN | netty-0 | o.c.r.c.NettyClientRouter | userEventTriggered: unhandled event SslCloseCompletionEvent(java.nio.channels.ClosedChannelException)2025-04-19T19:14:20.151Z | WARN | netty-2 | o.c.r.c.NettyClientRouter | userEventTriggered: unhandled event SslCloseCompletionEvent(java.nio.channels.ClosedChannelException)2025-04-19T19:14:20.151Z | WARN | netty-1 | o.c.r.c.NettyClientRouter | userEventTriggered: unhandled event SslCloseCompletionEvent(java.nio.channels.ClosedChannelException)2025-04-19T19:14:20.160Z | INFO | Cmpt-chkpter-9000 | o.c.c.CompactorCheckpointer | Exiting CorfuStoreCompactor2025-04-19T19:14:20.274Z INFO Runner - Finished running corfu compactor tool.
If you have any questions about this please make a ticket with NSX support.
Collect Support Bundles for Troubleshooting VMware NSX
Uploading files to cases on the Broadcom Support Portal
Creating and managing Broadcom support cases