When you are looking at the statistics for tuning, you want to observe things like:
If overall system performance is at an acceptable level, it’s not as critical to tune garbage collection. If you do decide to tune the JVM, it is a matter of trying out different settings to see what works best. Keep in mind what Donald Knuth stated about optimization: "We should forget about small efficiencies. Premature optimization is the root of all evil". You are generally either tuning for performance or memory utilization. As you tweak each setting it’s best to rerun the test a couple of times to observe the results. There will be slight variances from run to run due to other processes running on the machine during the test.
Even if you aren’t having specific memory issues, it’s worth it to make notes about how the application performs, and how it behaves for later reference – problems could appear later and a ‘baseline’ performance profile can be very useful in determining what has changed. Learn what the memory profile looks like under different loads, so that you know how to tune memory as the load increases.
Products like GemFire, which hold data and execute behavior on the server, can have significantly different garbage collection profiles than typical Java applications, where objects are typically shorter lived. The default choices made by the JVM will likely not be appropriate for a GemFire application.
The best possible approach in tuning your application is to first understand the profile of how the application runs, from both a performance and memory perspective. You need to understand the expected performance versus actual performance. You will also need to understand how resources, like memory are used. It is best to start with estimates of memory usage to size systems close to what they will need. Finally, consider what else is running on the machine at that time, and how it may be affecting available processing time and memory for your application.
GemFire Applications, generally deal with large amounts of memory. To start estimating how much memory your application will need, see the GemFire Best Practices and Capacity Planning Guide .
Once you have figured out how many objects will be held in memory, and how large they are, consider how your application behaves. Does it support lots of queries? Are the result sets large? If so, this could put a greater load on Young space more than Old space. Keep an eye on these statistics while you are observing your application.
On the other hand, does your application update data frequently? If it does, this will impact the Old Space, since an object will likely have been promoted to the Old space before it is collected. When an Old Space object is ‘updated’, you now have an object in Old Space that needs to be collected and a new object that will begin its journey through the multiple spaces and will likely eventually reach Old Space. For GemFire, these objects are usually the entire ‘value’ in the key-value pair. Value objects are not updated in place (Key objects should never be updated, only added and/or removed). Keep an eye on Old Gen statistics as you observe your application. Understand how full this area gets and how often GC happens.
Another point to understand is the difference in the size of objects in your JVM. The homogeneity of object sizes being stored is a primary factor in how much memory headroom is necessary to support any given use-case. If there is a mixture of large and small object sizes in the same process, memory fragmentation will occur much faster, causing longer application pauses for a Full GC with Compaction. In extreme cases, you will want to consider the separation of Data Management Nodes into “Server Groups” sub-clusters to break-apart the heterogeneous sized data types, using the Resource Manager, discussed later. Managing objects of like sizes together will reduce fragmentation of memory. If a new object needs to be allocated, it has a good chance of finding a memory segment or ‘hole’ that is the right size.
As described previously there is no set of magic heap parameters that works for all GemFire solutions. Before testing make shure that you have statistics gathering in GemFire enabled as the statistics can be very helpful in determining root cause. Also make shure to enable GC logging. See the article How To Collect Basic Information For Gemfire Issues for how to enable both.
The following basic rules for configuring JVM assures that you have a predictable heap that doesn't spend resources on optimizing the different memory spaces as memory usage changes:
Note that the two parameters: -Xmns and -Xmnx can be replaced with one parameter: -Xmn. If either -Xmns or -Xmnx are set -Xmn will not be used.
Be sure to set:
Calling System.gc() causes a garbage collection to happen that pauses all application threads. Most developers do not use this call, but it is a good safety measure to use this flag.
As described above set the generation sizes explicitly and avoid using NewRatio to set the size of the young generation. Set NewSize big enough to not promote too much to Old Gen unnecessarily, but small enough so that a ParNew collection (for very big heaps) doesn't take too long.
A good New size heap setting results in no more than one ParNew garbage collection per second and not less than one collection every 10-15 minutes. More than one ParNew garbage collection per second should be avoided as this can cause abnormally high CPU utilization. In the case of large heaps (over 8GB) the tradition of setting aside 25% of the heap devoted to the New size may be too aggressive and unnecessarily wasteful of available heap.
Tenured space, also referred to as old generation space, is the remainder determined by total heap (-Xmx) and young generation space size (-Xmnx). Occupancy fraction is when the CMS garbage collector is triggered. We want the CMS occupancy to trigger the CMS frequently enough as well to prevent a "concurrent failure" which can trigger a full GC when promoting memory from young generation space to tenured generation when there isn't sufficient continuous space in old generation for the promotion. CMS collections should happen around once every 15 minutes to around an hour. Be aware that avoiding CMS collections can cause issues with resources such as file descriptors building up, that are not fully released until a GC collection occurs. Likewise not doing CMS GC can result in full stop-the-world GC if the tenured space is to fragmented for promotion from young generation space. The less garbage generated there is, the closer to eviction threshold it should be. Note that the number of cpus plays a role here. For instance, if CMS GC is always running trying to collect because tenured space used exceeds occupancyFraction and you have only 2 cpus, GC can consume such a high percentage of cpu resources that the whole system can be negatively impacted.
Finding the total heap size for the GemFire members is a balancing act. More members equals more communication but on the other hand big JVMs means longer GC. Also keep in mind that a too big heap can result in never hitting CMS Occupancy Fraction so tenured generation GC is never done. This usually results in full stop-the-world GC which is generally not wanted in GemFire solutions in general. A good guideline for total memory is double what you estimate to need because you can have serialized and deserialized versions of the same object at the same time.
This setting determines the size of the stack for each thread in the JVM. The specific value that you should use will vary depending on the requirements of the GemFire solution, however in most cases the default value used by the JVM is too large. On 64 bit Linux it defaults to 1024m.
For a typical GemFire grid, this value can be lowered, saving memory and increasing the number of threads that can be run on a system. The easiest way to determine a value for your system is to start out with a very low value, for example 128k. Then run tests and look for a StackOverFlow exception in the logs. If you see the exception, then gradually increase the value and test. When the exceptions disappear, you have found the minimal value which works for your deployment. This is typically 192k or 256k.
The CMS occupancy fraction should be set below the GemFire eviction threshold parameter to perform GC recovering old generation memory prior to the more expensive operations of GemFire eviction.
Set critical-threshold in GemFire high enough to not reserve too much memory. For very large heaps, like 100GB, setting it to only 95% reserves roughly 5GB which likely excessive. For very large heaps, it can normally be set to 98%. For much smaller heaps, it may need to be lower, like 90%. Once critical threshold is hit GemFire is essentially dead in the water until a GC collection occurs.
Set the eviction threshold 10% lower than critical-threshold. Memory is not actually freed by eviction until garbage collection occurs, so you should have sufficient overhead to prevent a burst of activity that drives utilization through the eviction threshold, all the way to critical.
As s recommendation start with the following configuration:
These are only meant as a healthy general starting point. To get the optimal configuration you must run through several iterations of testing, analyzing statistics and GC logs, tuning configuration, and re-testing.
It is not advisable to intend to avoid tenured heap garbage collection. Connections get released but the file descriptor backing the connection doesn't get cleaned up until a GC occurs. Depending on how long lived the connection is the file descriptor will be in either New Eden Space collected with a ParNew GC or old generation requiring a CMS GC collection. The file descriptors are a limited resource but can be configured upward with little cost for most GemFire applications.
The above information applies to all supported versions of GemFire. For a fully detailed explanation of the different garbage collectors and how they work please refer to Oracle Garbage Collection Tuning Guide .