How to Troubleshoot GemFire: New Log messages to accurately measure Tenured Heap Consumption
search cancel

How to Troubleshoot GemFire: New Log messages to accurately measure Tenured Heap Consumption

book

Article ID: 294472

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

GemFire has added new logging to help customers more accurately measure tenured heap consumption. Often customers are opening tickets due to other monitoring tools, even our Pulse tool, showing that heap consumption is higher than expected and potentially reaching levels of concern.

The purpose of this article is to assist customers to better understand true consumption. Hopefully, in most situations it will help customers to determine that they are not at risk of any issues, and that it is purely a temporary situation until a garbage collection.

Customers often open Support tickets concerned that they are seeing very high heap consumption and not understanding why it might be so high. This is often due to the fact that monitors of heap consumption are really completely inaccurate in providing you the true long lived consumption. 

Basically, these readings often include lots of garbage that exists in tenured heap and will continue to exist in tenured heap until some garbage collection occurs to bring down consumption of that space.

Symptom

The basic symptom is that customers are attempting to monitor heap consumption on a per-member basis or across the entire distributed system and the monitoring shows heap consumption higher than expected.

It may even be the case that some alerting mechanisms have fired, driving tickets for analysis and recommendations. 

The recommendation upon any tools driving concerns would be to examine our GemFire logs to see the true tenured consumption over time. These should be examined across all members to assess whether consumption has increased over time or even the frequency of the log messages.

Environment

Product Version: 9.10
OS: Linux

Resolution

The new info level log messages are as follows:
[info 2021/03/16 11:33:56.975 EDT xxx <Notification Handler1> tid=0x34] A tenured heap garbage collection has occurred.  New tenured heap consumption: 41404928
[info 2021/03/16 11:34:39.587 EDT xxx <Notification Handler1> tid=0x34] A tenured heap garbage collection has occurred.  New tenured heap consumption: 33339736
[info 2021/03/16 11:35:20.106 EDT xxx <Notification Handler1> tid=0x34] A tenured heap garbage collection has occurred.  New tenured heap consumption: 28388320
[info 2021/03/16 12:04:08.874 EDT xxx <Notification Handler1> tid=0x34] A tenured heap garbage collection has occurred.  New tenured heap consumption: 19711648
[info 2021/03/16 12:04:51.297 EDT xxx <Notification Handler1> tid=0x34] A tenured heap garbage collection has occurred.  New tenured heap consumption: 16823952
[info 2021/03/16 12:35:34.843 EDT xxx<Notification Handler1> tid=0x34] A tenured heap garbage collection has occurred.  New tenured heap consumption: 16129328
=================

You should evaluate/filter these messages over time across all members. In doing so, you can determine the following per member:
 
  1. You can determine whether the heap is growing over time and then map that to your known entry count growth. If it is growing over time with no entry count growth, then this needs to be understood and could warrant a new ticket.
  2. You can determine the frequency of the collections. If very frequent or much too frequent, it is a sign that your tenured collections are happening too frequently and you likely need to increase heap or for example your CMSInitiatingOccupancyFraction if using CMS.

Important note for G1GC

In early versions of G1GC, G1 does not send out notifications for mixed collections. It would only send out such notifications for FULL GC's which we try to avoid in GemFire. GemFire essentially registers with the JVM to get notified when a tenured collection occurs.    

With early versions of G1, mixed collections would not drive the notifications. That said, starting in Java 8 build 212, G1GC did start sending notifications for mixed collections and so our new log messages will appear for CMS on any supported version of GemFire and for G1 post build openJDK 212.

As for other versions of Java and GC algorithms, GemFire will log the messages if and only if we receive the notifications from the JVM corresponding to the tenured collections. This requires the MemoryPoolMXBean isUsageThresholdSupported() and isCollectionUsageThresholdSupported() API's to be true for the given JVM/GC algorithm combination.

In any case, we will either log the message or it will either be a no-op or you will see a log message, similar to the one below, indicating an exception occurred:
An Exception occurred while attempting to print out tenured heap consumption


If you do not see these new messages helping you, consider getting to a version of the JDK that will send the notifications for the GC algorithm you have chosen to use.


Checklist:
The recommendation upon any tools driving concerns would be to examine our GemFire logs to see the true tenured consumption over time. These should be examined across all members to assess whether consumption has increased over time or even the frequency of the log messages.