- NSX for vSphere with Large scale DFW deployment
- NSX Manager CPU very high or pegged at 100%
- High rate of churn in the environment resulting in a large number of System Events.
- top -H sorted by CPU shows 8 JAVA threads consuming most CPU
PID USER PR NI VIRT RES %CPU %MEM TIME+ S COMMAND
6611 root 20 0 15.592g 0.010t 95.5 44.8 429:13.14 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
6615 root 20 0 15.592g 0.010t 90.9 44.8 428:51.45 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
6616 root 20 0 15.592g 0.010t 90.9 44.8 428:45.98 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
6617 root 20 0 15.592g 0.010t 90.9 44.8 428:56.11 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
6618 root 20 0 15.592g 0.010t 90.9 44.8 428:31.83 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
6612 root 20 0 15.592g 0.010t 86.4 44.8 429:04.79 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
6613 root 20 0 15.592g 0.010t 81.8 44.8 429:05.98 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
6614 root 20 0 15.592g 0.010t 81.8 44.8 428:52.42 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
- Convert each pid to hex and check /var/log/nsx-wrapper.log to confirm that they are Garbage Collector
e.g. pid 6611 is 19D3 in hex which is seen below as "nid=0x19d3"
INFO | jvm 1 | <Date> 09:48:16 | "GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007f476002f000 nid=0x19da runnable
INFO | jvm 1 | <Date> 09:48:16 | "GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f4760023000 nid=0x19d3 runnable
INFO | jvm 1 | <Date> 09:48:16 | "GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f4760024800 nid=0x19d4 runnable
INFO | jvm 1 | <Date> 09:48:16 | "GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f4760026800 nid=0x19d5 runnable
INFO | jvm 1 | <Date> 09:48:16 | "GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f4760028000 nid=0x19d6 runnable
INFO | jvm 1 | <Date> 09:48:16 | "GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007f476002a000 nid=0x19d7 runnable
INFO | jvm 1 | <Date> 09:48:16 | "GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007f476002b800 nid=0x19d8 runnable
INFO | jvm 1 | <Date> 09:48:16 | "GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007f476002d800 nid=0x19d9 runnable
INFO | jvm 1 | <Date> 09:48:16 | "GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007f476002f000 nid=0x19da runnable
- System event purge runs every 8 hours and shows large large number of events deleted
<Date> 16:00:00.174 GMT-00:00 INFO TaskFrameworkExecutor-17 SystemEventDaoImpl:349 - - [nsxv@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Starting to purge system events with retained count 100000...
<Date> 16:01:08.727 GMT-00:00 INFO TaskFrameworkExecutor-17 SystemEventDaoImpl:354 - - [nsxv@6876 comp="nsx-manager" level="INFO" subcomp="manager"] # of system events deleted: 1131789