Symptoms:
- The cluster status shows as up and stable when you run: get cluster status
- The Transport nodes show as connected in the Fabric screen.
- In the Overview screen for System -> Fabric -> Nodes -> Edge or Host Transport nodes, the Controller Connectivity shows as UNKNOWN.
- Tunnels to these Transport nodes show as DOWN also.
- DFW rule publishing may fail due to this issue.
- Other CLI commands such as get nodes, get services may fail.
- You have NSX Intelligence installed.
- In the NSX-T manager proton-tomcat-wrapper.log we see:
Exception in thread "ForkJoinPool.commonPool-worker-4" java.lang.OutOfMemoryError: unable to create new native thread
The JVM has run out of memory. Requesting thread dump.
Dumping JVM state.
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:717)
at java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1486)
at java.util.concurrent.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1517)
at java.util.concurrent.ForkJoinPool.deregisterWorker(ForkJoinPool.java:1609)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:167)
Exception in thread "ForkJoinPool.commonPool-worker-11" java.lang.OutOfMemoryError: unable to create new native thread
The JVM has run out of memory. Requesting thread dump.
- In the NSX-T manager nsxapi log we see a lot of events like the following, for example 2 in 3 seconds :
INFO intelligence-alarm-start-stop EventSource 8004 MONITORING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Starting EventSource
- If we do a thread dump, we can see a very large number of threads for the EventReportProcessor.java process in the proton-tomcat-wrapper.log, like the following:
INFO | jvm 1 | 2021/03/17 12:55:21 | "pool-9971-thread-1" #83259 prio=5 os_prio=0 tid=0x0000725d04fb2800 nid=0x514 waiting on condition [0x0000725b6177d000]
INFO | jvm 1 | 2021/03/17 12:55:21 | java.lang.Thread.State: WAITING (parking)
INFO | jvm 1 | 2021/03/17 12:55:21 | at sun.misc.Unsafe.park(Native Method)
INFO | jvm 1 | 2021/03/17 12:55:21 | - parking to wait for <0x0000725d3bf01140> (a java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.FutureTask.get(FutureTask.java:191)
INFO | jvm 1 | 2021/03/17 12:55:21 | at com.vmware.nsx.monitoring.clientlibrary.core.EventReportProcessor$1.run(EventReportProcessor.java:94)
INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
INFO | jvm 1 | 2021/03/17 12:55:21 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
INFO | jvm 1 | 2021/03/17 12:55:21 | at java.lang.Thread.run(Thread.java:748)