Application on NSX manager node has crashed generating a core file named proton_oom.hprof
search cancel

Application on NSX manager node has crashed generating a core file named proton_oom.hprof

book

Article ID: 387886

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • An alert is generated in the NSX UI stating "Application on NSX node <manager node> has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team."
  • There is a core file is present under /image/core on the manager node noted in the alert named proton_oom.hprof.
    Note: You can see that the core file is present as the root user by running ls -la /image/core or as the admin user by running get core-dump.
  • The proton service is running on the manager node noted in the alert but has only been running since the alert was generated.
    Note: You can check the status of the proton service as the root user by running /etc/init.d/proton status:

    # /etc/init.d/proton status
    ● proton.service - proton: VMware NSX Proton API server
         Loaded: loaded (/etc/init.d/proton; enabled; vendor preset: enabled)
         Active: active (running) since Mon 2025-01-15 10:54:39 UTC; 1 days ago
           Docs: man:systemd-sysv-generator(8)
       Main PID: 5166 (wrapper)
          Tasks: 1092 (limit: 57708)
         Memory: 6.5G
            CPU: 1d 1h 10min 57.132s
         CGroup: /system.slice/proton.service
                 ├─5166 /usr/tanuki/bin/./wrapper /usr/tanuki/bin/../conf/proton-tomcat-wrapper.conf wrapper.syslog.ident=proton wrapper.pidfile=/var/run/proton/proton.pi…
                 └─5222 /usr/lib/jvm/openjdk-java11-runtime-amd64/bin/java -Djava.util.logging.config.file=/opt/vmware/proton-tomcat/conf/logging.properties -Djava.util.l…

  • You see messages similar to the following in the proton-tomcat-wrapper.log file on the manager node noted in the alert:

    INFO   | jvm 1    | 2025/01/08 21:12:38 | java.lang.OutOfMemoryError: Java heap space
    STATUS | wrapper  | 2025/01/08 21:12:38 | The JVM has run out of memory.  Requesting thread dump.
    STATUS | wrapper  | 2025/01/08 21:12:38 | Dumping JVM state.
    INFO   | jvm 1    | 2025/01/08 21:12:38 | Dumping heap to /image/core/proton_oom.hprof ...
    INFO   | jvm 1    | 2025/01/08 21:13:18 | Heap dump file created [9470923368 bytes in 39.838 secs]

  • You see messages similar to the following in the nsxapi.log on the manager node noted in the alert:

    2025-01-15T10:54:35.511Z  INFO IdfwCleaner AutoLogoutProcessor 2973510 FIREWALL [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Found active sessions 6472 need to be auto logged out, cut off time 1736909675497
    2025-01-15T10:54:35.511Z  INFO IdfwCleaner AutoLogoutProcessor 2973510 FIREWALL [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Auto logging out active sessions 1000
    2025-01-15T10:54:50.427Z  WARN IdfwCleaner ObjectsView 2973510 TXEnd[TX[1e66]] Aborted Exception 
    org.corfudb.runtime.exceptions.TransactionAbortedException: TX ABORT  | Snapshot Time = Token(epoch=31, sequence=506475162) | Failed Transaction ID = 7eb9929f-####-####-####-fa04de3a1e66 | Offending Address = 5064##### | Conflict Key = 36091FEF######## | Conflict Stream = nsx$IdentityIpset | Cause = CONFLICT | Time = 14929 ms
    at org.corfudb.runtime.view.StreamsView.append(StreamsView.java:180) ~[?:?]
    at org.corfudb.runtime.view.StreamsView.append(StreamsView.java:233) ~[?:?]
    at org.corfudb.runtime.view.StreamsView.append(StreamsView.java:244) ~[?:?]
    at org.corfudb.runtime.object.transactions.OptimisticTransactionalContext.getConflictSetAndCommit(OptimisticTransactionalContext.java:223) ~[?:?]
    at org.corfudb.runtime.object.transactions.WriteAfterWriteTransactionalContext.commitTransaction(WriteAfterWriteTransactionalContext.java:34) ~[?:?]
    at org.corfudb.runtime.view.ObjectsView.TXEnd(ObjectsView.java:162) ~[?:?]
    at org.corfudb.runtime.collections.TxnContext.commit(TxnContext.java:793) ~[?:?]
    at com.vmware.nsx.persistence.UfoTxn.commit(UfoTxn.java:937) ~[?:?]
    at com.vmware.nsx.management.container.dao.IdentifiableProxyObjectDao.commit_aroundBody0(IdentifiableProxyObjectDao.java:784) ~[?:?]
    at com.vmware.nsx.management.container.dao.IdentifiableProxyObjectDao$AjcClosure1.run(IdentifiableProxyObjectDao.java:1) ~[?:?]
    at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) ~[?:?]
    at io.micrometer.core.aop.TimedAspect.processWithTimer(TimedAspect.java:119) ~[?:?]
    at io.micrometer.core.aop.TimedAspect.ajc$inlineAccessMethod$io_micrometer_core_aop_TimedAspect$io_micrometer_core_aop_TimedAspect$processWithTimer(TimedAspect.java:1) ~[?:?]
    at io.micrometer.core.aop.TimedAspect.timedMethod(TimedAspect.java:97) ~[?:?]
    at com.vmware.nsx.management.container.dao.IdentifiableProxyObjectDao.commit(IdentifiableProxyObjectDao.java:781) ~[?:?]
    at com.vmware.nsx.management.idfw.processor.AutoLogoutProcessor.autoLogout(AutoLogoutProcessor.java:59) ~[?:?]
    at com.vmware.nsx.management.idfw.daemon.IdfwDbDaemon.autoLogoutOldActiveUserSessions(IdfwDbDaemon.java:144) ~[?:?]
    at com.vmware.nsx.management.idfw.daemon.IdfwDbDaemon.run(IdfwDbDaemon.java:122) ~[?:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
    at java.util.concurrent.FutureTask.runAndReset(Unknown Source) ~[?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
    at com.vmware.nsx.util.concurrent.Executors$MeteredRunnable.run(Executors.java:353) ~[nsx-util.jar:?]
    at com.vmware.nsx.util.concurrent.Executors$MeteredRunnable.run(Executors.java:353) ~[nsx-util.jar:?]
    at java.lang.Thread.run(Unknown Source) ~[?:?]
    2025-01-15T10:54:50.427Z  WARN IdfwCleaner IdentifiableProxyObjectDao 2973510 - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] Received TransactionAbortedException from the Corfu client.
    2025-01-15T10:54:50.428Z  WARN IdfwCleaner IdentifiableProxyObjectDao 2973510 - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] com.vmware.nsx.management.container.exceptions.ConcurrentUpd>  ateException: STREAM_ID = 71f63950-####-####-####-e8f57d148ee4 | CONFLICT_VALUE = java.lang.Error: Unable to find the corresponding key | CONFLICT_KEY_HASH = 38936784655######## | CONFLICT_KEY = uuid {
    left: 12350417914########
    right: 105989396920########
    }
    | MAP_NAME = 71f63950-####-####-####-e8f57d148ee4 | TRANSACTION_ID = 7eb9929f-####-####-####-fa04de3a1e66 | OFFENDING_ADDRESS = 50647####

Environment

VMware NSX 4.x (below 4.2.1)

Cause

The process responsible for purging old IDFW login/logout events consumes too much memory and causes the proton service to crash

Resolution

This issue is resolved in VMware NSX 4.2.1.0.

Additional Information