GemFire: Thread monitoring can cause JVM Pauses
search cancel

GemFire: Thread monitoring can cause JVM Pauses

book

Article ID: 294452

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

By default, GemFire has a property thread-monitor-enabled=true set in the gemfire.properties file.     We have discovered that our use of this monitoring, where GemFire logs the full stack trace of the "problem" thread, can drive the JVM to pause due to the need for the VM to come to a safepoint.

Environment

Product Version: 9.10
OS: ALL

Resolution

This issue is getting resolved going forward, where GemFire will no longer attempt to gather the full stack trace.   In addition, going forward, GemFire will attempt to have many fewer false "thread is stuck" messages, by increasing the default property for  thread-monitor-time-limit-ms=30000 to 2 minutes (120000).

GemFire will also no longer use the phrase "thread is stuck", given that it can be very misleading.   Some other wording, such as "thread may be stuck" will be incorporated instead.

Short term, to avoid any chance of experiencing GemFire Distributed System issues due to thread monitoring, the recommendation is to disable thread monitoring by setting thread-monitor-enabled=false instead.    

It is important then to realize that if you ever encounter any performance issues, or blocking type behavior, anything that gives you cause for concern regarding thread count growth, etc., please gather thread dumps across the entire cluster.     

If you do choose to gather thread dumps, make sure to gather 3 thread dumps per member, spread out by 1 minute roughly each, so that we can differentiate between threads that are not progressing, versus threads that are exhibiting congestion.