Application on NSX manager node has crashed generating a core file named proton_oom.hprof
search cancel

Application on NSX manager node has crashed generating a core file named proton_oom.hprof

book

Article ID: 387886

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX Managers continually reporting application crashing and some VMs migrations loosing connectivity on some ESXi Hosts.
  • An alert is generated in the NSX UI stating:
    "Application on NSX node <manager node> has crashed. The number of core files found is 1. Collect the Support Bundle including core dump files and contact VMware Support team."
  • There is a core file is present under /image/core on the manager node noted in the alert named proton_oom.hprof.
    • Note: You can see that the core file is present as the root user by running ls -la /image/core or as the admin user by running get core-dump.
  • The proton service is running on the manager node noted in the alert but has only been running since the alert was generated.
    • Note: You can check the status of the proton service as the root user by running /etc/init.d/proton status:
      # /etc/init.d/proton status
      proton.service - proton: VMware NSX Proton API server
         Loaded: loaded (/etc/init.d/proton; enabled; vendor preset: enabled)
           Active: active (running) since Mon 2025-01-15 10:54:39 UTC; 1 days ago
           Docs: man:systemd-sysv-generator(8)
       Main PID: 5166 (wrapper)
          Tasks: 1092 (limit: 57708)
         Memory: 6.5G
            CPU: 1d 1h 10min 57.132s
         CGroup: /system.slice/proton.service
                 ├─5166 /usr/tanuki/bin/./wrapper /usr/tanuki/bin/../conf/proton-tomcat-wrapper.conf wrapper.syslog.ident=proton wrapper.pidfile=/var/run/proton/proton.pi…
                   └─5222 /usr/lib/jvm/openjdk-java11-runtime-amd64/bin/java -Djava.util.logging.config.file=/opt/vmware/proton-tomcat/conf/logging.properties -Djava.util.l…
  • You see messages similar to the following in the /var/log/proton/proton-tomcat-wrapper.log file on the manager node noted in the alert:
    INFO   | jvm 1    | 2025/01/08 21:12:38 | java.lang.OutOfMemoryError: Java heap space
    STATUS | wrapper  | 2025/01/08 21:12:38 | The JVM has run out of memory.  Requesting thread dump.
    STATUS | wrapper  | 2025/01/08 21:12:38 | Dumping JVM state.
    INFO   | jvm 1    | 2025/01/08 21:12:38 | Dumping heap to /image/core/proton_oom.hprof ...
    INFO   | jvm 1    | 2025/01/08 21:13:18 | Heap dump file created [9470923368 bytes in 39.838 secs]
  • You see messages similar to the following in the /var/log/proton/nsxapi.log on the manager node noted in the alert:
    2025-01-15T10:54:50.427Z  WARN IdfwCleaner ObjectsView 2973510 TXEnd[TX[1e66]] Aborted Exception 
    org.corfudb.runtime.exceptions.TransactionAbortedException: TX ABORT  | Snapshot Time = Token(epoch=31, sequence=506475162) | Failed Transaction ID = 7eb9929f-####-####-####-fa04de3a1e66 | Offending Address = 5064##### | Conflict Key = 36091FEF######## | Conflict Stream = nsx$IdentityIpset | Cause = CONFLICT | Time = 14929 ms

    2025-01-15T10:54:50.427Z  WARN IdfwCleaner IdentifiableProxyObjectDao 2973510 - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] Received TransactionAbortedException from the Corfu client.

    2025-01-15T10:54:50.428Z  WARN IdfwCleaner IdentifiableProxyObjectDao 2973510 - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] com.vmware.nsx.management.container.exceptions.ConcurrentUpd>  ateException: STREAM_ID = 71f63950-####-####-####-e8f57d148ee4 | CONFLICT_VALUE = java.lang.Error: Unable to find the corresponding key | CONFLICT_KEY_HASH = 38936784655######## | CONFLICT_KEY = uuid {
    left: 12350417914########
    right: 105989396920########
    }
    | MAP_NAME = 71f63950-####-####-####-e8f57d148ee4 | TRANSACTION_ID = 7eb9929f-####-####-####-fa04de3a1e66 | OFFENDING_ADDRESS = 50647####

Environment

VMware NSX 4.x (below 4.2.1)

Cause

The process responsible for purging old IDFW login/logout events consumes too much memory and causes the proton service to crash.

Resolution

This issue is resolved in VMware NSX 4.2.1.0, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

Additional Information