NSX-T Manager GUI is Unavailable, tanuki.log reports jvm running out of memory
search cancel

NSX-T Manager GUI is Unavailable, tanuki.log reports jvm running out of memory

book

Article ID: 395500

calendar_today

Updated On:

Products

VMware NSX-T Data Center

Issue/Introduction

NSX-T GUI is inaccessible but all services are running. The web GUI may display the error below as well. You might also not be able to run a get cluster status when you encounter this issue from admin cli.



Under /var/log/cbm/tanuki.log you should see the following log lines for the jvm in cbm in charge of compaction running out of memory.


tanuki.log.10:11457:STATUS | wrapper  | 2025/04/19 17:30:23 | The JVM has run out of memory.  Requesting thread dump.
tanuki.log.10:11459:STATUS | wrapper  | 2025/04/19 17:30:23 | The JVM has run out of memory.  Restart JVM (Ignoring, already restarting).
tanuki.log.10:13289:STATUS | wrapper  | 2025/04/19 17:30:49 | The JVM has run out of memory.  Requesting thread dump.
tanuki.log.10:13291:STATUS | wrapper  | 2025/04/19 17:30:49 | The JVM has run out of memory.  Restarting JVM.
tanuki.log.10:14986:STATUS | wrapper  | 2025/04/19 17:30:53 | The JVM has run out of memory.  Requesting thread dump.
tanuki.log.10:14988:STATUS | wrapper  | 2025/04/19 17:30:53 | The JVM has run out of memory.  Restart JVM (Ignoring, already restarting).
tanuki.log.10:16753:STATUS | wrapper  | 2025/04/19 17:30:53 | The JVM has run out of memory.  Requesting thread dump.
tanuki.log.10:16755:STATUS | wrapper  | 2025/04/19 17:30:53 | The JVM has run out of memory.  Restart JVM (Ignoring, already restarting).
tanuki.log.10:18543:STATUS | wrapper  | 2025/04/19 17:31:25 | The JVM has run out of memory.  Requesting thread dump.
tanuki.log.10:18545:STATUS | wrapper  | 2025/04/19 17:31:25 | The JVM has run out of memory.  Restarting JVM.


In /var/log/syslog you can see this as well with the log line below.


2025-04-19T18:01:06.775Z  NSX 19643 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP100" level="ERROR" subcomp="cbm"] Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: Java heap space


Under /var/log/cbm/cbm.log you will see reports of services being down even if they report as being up when you run 'get cluster status' as admin.


cbm.log:19933:2025-04-19T18:52:26.675Z ERROR HeartbeatServiceServiceMonitorStatusUpdaterThread ServiceMonitor 92085 - [nsx@6876 comp="nsx-manager" errorCode="HBS153" level="ERROR" s2comp="service-monitor" subcomp="cbm"] One or more services are down: [Epoch: 2]CLUSTER_MANAGER:UNKNOWN,SM:DOWN,MONITORING:DOWN,AR:DOWN,MESSAGING_MANAGER:DOWN,PROTON:DOWN,CONTROLLER:DOWN,IDPS_REPORTING:DOWN,SEARCH:DOWN,CM_INV:DOWN,HTTP:DOWN

In /image/core you might also see cbm core dumps depending on how long the service has been crashing.

Validate, in corfu-compactor-audit.log & corfu-compactor-leader.log to see if compaction is still running. If compaction is not running gracefully reboot all three NSX managers, and compaction will restart once the managers are back up.

Cause

If the environment experiences prolonged spikes in network or storage latency or spikes in cpu usage on the host during the compaction process, this process might take a longer period of time to complete, causing cbm service to run out of memory. Restarting the managers, will clear memory, and kick off a new compaction request.

Resolution

Wait until the compaction process completes after the reboot. You can observe this process by tailing in either /var/log/corfu/corfu-compactor-leader.log or /var/log/corfu/corfu-compactor-audit.log. This is a scenario where it can take a while depending on the size of the environment.

/var/log/corfu/corfu-compactor-leader.log completion

2025-04-18T17:29:29.771Z | INFO  |              Cmpt-9000-chkpter |               compactor-leader | DynamicTriggerPolicy: Trigger as elapsedTime 902 > safeTrimPeriod 900
2025-04-18T17:29:29.898Z | INFO  |              Cmpt-9000-chkpter |               compactor-leader | Trim completed, elapsed(0s), log address up to 2989733883.
2025-04-18T17:29:29.898Z | INFO  |              Cmpt-9000-chkpter |               compactor-leader | =============Initiating Distributed Compaction============
2025-04-18T17:29:29.978Z | INFO  |              Cmpt-9000-chkpter |               compactor-leader | Init compaction cycle is successful. Min token 2989778336
2025-04-19T17:49:04.782Z | INFO  |         CorfuServer-shutdown-4 |               compactor-leader | Compactor Orchestrator service shutting down.
2025-04-19T17:52:11.981Z | INFO  |       initializationTaskThread |               compactor-leader | Starting Compaction service...
2025-04-19T17:52:22.203Z | INFO  |              Cmpt-9000-chkpter |               compactor-leader | getNewCorfuRuntime: Corfu Runtime connected successfully
2025-04-19T17:53:10.733Z | INFO  |              Cmpt-9000-chkpter |               compactor-leader | invokeCheckpointing: hostName: (NSX Manager IPs), port: 9000
2025-04-19T17:53:10.757Z | INFO  |              Cmpt-9000-chkpter |               compactor-leader | Triggered compactor jvm
2025-04-19T18:00:17.268Z | INFO  |              Cmpt-9000-chkpter |               compactor-leader | Shutting down existing checkpointer jvm
2025-04-19T18:29:15.257Z | ERROR |                       Thread-6 |               compactor-leader | Exception occurred while getting ErrorStream:

/var/log/corfu/corfu-compactor-audit.log completion



2025-04-19T19:14:20.148Z | INFO  |              Cmpt-chkpter-9000 |   org.corfudb.util.FileWatcher | Closed FileWatcher.
2025-04-19T19:14:20.148Z | INFO  |                  FileWatcher-0 |   org.corfudb.util.FileWatcher | FileWatcher failed to poll file /config/cluster-manager/corfu/private/keystore.jks, Exception: java.nio.file.ClosedWatchServiceException., isStopped: true
2025-04-19T19:14:20.148Z | INFO  |                  FileWatcher-0 |   org.corfudb.util.FileWatcher | Watch service is stopped. Skip reloading new watch service.
2025-04-19T19:14:20.150Z | WARN  |                        netty-0 |      o.c.r.c.NettyClientRouter | userEventTriggered: unhandled event SslCloseCompletionEvent(java.nio.channels.ClosedChannelException)
2025-04-19T19:14:20.151Z | WARN  |                        netty-2 |      o.c.r.c.NettyClientRouter | userEventTriggered: unhandled event SslCloseCompletionEvent(java.nio.channels.ClosedChannelException)
2025-04-19T19:14:20.151Z | WARN  |                        netty-1 |      o.c.r.c.NettyClientRouter | userEventTriggered: unhandled event SslCloseCompletionEvent(java.nio.channels.ClosedChannelException)
2025-04-19T19:14:20.160Z | INFO  |              Cmpt-chkpter-9000 |    o.c.c.CompactorCheckpointer | Exiting CorfuStoreCompactor
2025-04-19T19:14:20.274Z  INFO Runner - Finished running corfu compactor tool.

 

 

Additional Information