Reduce the frequency of misleading alert messages in log stating "Above heap critical threshold"
search cancel

Reduce the frequency of misleading alert messages in log stating "Above heap critical threshold"

book

Article ID: 294337

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

Why do we get the following log messages when actually VMware Tanzu GemFire operations are using normal heap?
[error 2020/03/18 20:00:58.487 EDT DCS-DCS-CLUSTER-10.104.35.95-dmnode-002 <Notification Handler> tid=0x4e] Member: 10.104.35.95(DCS-DCS-CLUSTER-10.104.35.95-dmnode-002:17214)<v3>:10131 above heap critical threshold
[info 2020/03/18 20:00:58.487 EDT DCS-DCS-CLUSTER-10.104.35.95-dmnode-002 <Notification Handler> tid=0x4e] Member: 10.104.35.95(DCS-DCS-CLUSTER-10.104.35.95-dmnode-002:17214)<v3>:10131 above heap eviction threshold


Environment

Product Version: 9.6

Resolution

There exists an issue where the incorrect heap consumption is reported.  Research indicates that this only occurs in the case of very frequent CMS collections due to staying above the CMSInitiatingOccupancyFraction, even after GC's. To reduce or completely eliminate these alerts, you can set the below parameter to some low number like 3. The behavior of VMware Tanzu GemFire changes depending on your version of VMware Tanzu GemFire. In the latest versions, beyond 9.8, setting this parameter to 2 should completely eliminate all false positive alerts due to the bad heap reading. Prior to version 9.8, setting this parameter to, say 3, means that we would only get this false "above critical" alert due to the "false/bad" reading every 3rd time.
gemfire.memoryEventThreshold
This only happens when there is very frequent CMS. In other words, heap consumption is above the configured CMSInitiatingOccupancyFraction and not going below CMSInitiatingOccupancyFraction when a collection occurs. Increasing heap or increasing OccupancyFraction to levels above your steady state consumption will eliminate the false alerts.
-XX:CMSInitiatingOccupancyFraction=<N>
Note: This happens in all the versions of VMware Tanzu GemFire, but can be eliminated in the latest versions by setting to 2. In doing so, if you start to hit these alerts, you can trust that the consumption is real, and not driven by the false heap readings.