High CPU on NSX-v Manager
search cancel

High CPU on NSX-v Manager

book

Article ID: 324247

calendar_today

Updated On:

Products

VMware NSX Data Center for vSphere

Issue/Introduction

  • NSX for vSphere with Large scale DFW deployment
  • NSX Manager CPU very high or pegged at 100%
  • High rate of churn in the environment resulting in a large number of System Events.
  • top -H sorted by CPU shows 8 JAVA threads consuming most CPU
 PID USER      PR  NI    VIRT    RES %CPU %MEM     TIME+ S COMMAND
 6611 root      20   0 15.592g 0.010t 95.5 44.8 429:13.14 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
 6615 root      20   0 15.592g 0.010t 90.9 44.8 428:51.45 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
 6616 root      20   0 15.592g 0.010t 90.9 44.8 428:45.98 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
 6617 root      20   0 15.592g 0.010t 90.9 44.8 428:56.11 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
 6618 root      20   0 15.592g 0.010t 90.9 44.8 428:31.83 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
 6612 root      20   0 15.592g 0.010t 86.4 44.8 429:04.79 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
 6613 root      20   0 15.592g 0.010t 81.8 44.8 429:05.98 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
 6614 root      20   0 15.592g 0.010t 81.8 44.8 428:52.42 R /usr/java/jre/bin/java -Djava.util.logging.config.file=/usr/nsx-webserver/conf/logging.properties -server +
  • Convert each pid to hex and check /var/log/nsx-wrapper.log to confirm that they are Garbage Collector
    e.g. pid 6611 is 19D3 in hex which is seen below as "nid=0x19d3"
INFO   | jvm 1    | <Date> 09:48:16 | "GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007f476002f000 nid=0x19da runnable
INFO   | jvm 1    | <Date> 09:48:16 | "GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f4760023000 nid=0x19d3 runnable
INFO   | jvm 1    | <Date> 09:48:16 | "GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f4760024800 nid=0x19d4 runnable
INFO   | jvm 1    | <Date> 09:48:16 | "GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f4760026800 nid=0x19d5 runnable
INFO   | jvm 1    | <Date> 09:48:16 | "GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f4760028000 nid=0x19d6 runnable
INFO   | jvm 1    | <Date> 09:48:16 | "GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007f476002a000 nid=0x19d7 runnable
INFO   | jvm 1    | <Date> 09:48:16 | "GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007f476002b800 nid=0x19d8 runnable
INFO   | jvm 1    | <Date> 09:48:16 | "GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007f476002d800 nid=0x19d9 runnable
INFO   | jvm 1    | <Date> 09:48:16 | "GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007f476002f000 nid=0x19da runnable

 

  • System event purge runs every 8 hours and shows large large number of events deleted
<Date> 16:00:00.174 GMT-00:00  INFO TaskFrameworkExecutor-17 SystemEventDaoImpl:349 - - [nsxv@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Starting to purge system events with retained count 100000...
<Date> 16:01:08.727 GMT-00:00  INFO TaskFrameworkExecutor-17 SystemEventDaoImpl:354 - - [nsxv@6876 comp="nsx-manager" level="INFO" subcomp="manager"] # of system events deleted: 1131789



Environment

VMware NSX Data Center for vSphere 6.4.x

Cause

In large scale DFW environments with a high rate of churn, frequent updates from vCenter trigger a large volume of DFW publishing events.
By default the System Events pool is a single thread process.
This single thread is not able to process the high volume system events fast enough and results in the heap filling up.
The Garbage Collector threads run to free memory but because the environment churn rate is so high, the GC threads are running continuously and consuming high CPU on the Manager.

Resolution

This issue is resolved in VMware NSX for vSphere 6.4.9, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.