The linux futex_wait
call has been broken for about a year (since kernel 3.14; around Jan. 2014) and has just recently been fixed (in kernel 3.18; around October 2014). More importantly, this breakage seems to have been backported into major distributions (for example, RHEL 6.6 and its cousins; released in October 2014) and the fix for it has only recently been back-ported (RHEL 6.6.z and cousins have the fix).
The futex wait bug can freeze up a JVM which makes it impossible to gather thread dumps.
Resolution: Upgrade to a Linux distribution with kernel 3.18 or greater (e.g. RHEL 6.6 should be upgrade to 6.6.x).
In JDK 1.7.0 update 76, there is a bug in the log.util
package described by Oracle in the release notes. This bug causes log rolling issues in GemFire. Upgrading to JDK 1.7 update 79 or 80 is recommended, though JDK versions prior to update 76 are also known to work.
During development of new features, the GemFire engineering team has noticed problematic behavior in JDK 1.8.0_45. Frequently, a JVM would go dark for up to a minute and be removed out of the distributed system. Statistics records showed a large spike in GC duration after a period of no statistics being recorded at all. While this was happening, the host system's load-averages were not generally high enough to indicate issues. During tests, an abnormally high number of SEGVs and other JVM crashes were also noticed. Oracle JDK 1.8 release notes for later 1.8 builds note that SEGV problems have been fixed.
Upgrading the JDK to a 1.8 build later than update 45 is recommended.
Do not use the G1 garbage collector with any JDK prior to 8 as an important fragmentation bug was resolved in JDK 8:
Oracle bug #6976350: G1: deal with fragmentation while copying objects during GC fixed in JDK 8