Troubleshooting Guide: Investigating Sudden GemFire Node Departure with Azul Zing JVM
search cancel

Troubleshooting Guide: Investigating Sudden GemFire Node Departure with Azul Zing JVM

book

Article ID: 434663

calendar_today

Updated On:

Products

VMware Tanzu Gemfire Pivotal GemFire VMware Tanzu Data Suite VMware Tanzu Data Suite VMware Tanzu Data Intelligence

Issue/Introduction

When a node unexpectedly drops from a GemFire cluster, standard server logs are often insufficient for identifying the root cause because the process may have terminated too abruptly to record the event. This is especially true when using the Azul Zing JVM, where specialized diagnostic logs are required to understand low-level process failures.

Cause

1. The Limitation of GemFire Server Logs:

In a sudden process crash, GemFire’s internal logging mechanism, which typically records member departures or network issues, stops immediately. Surviving nodes will report that the member has departed, but the log on the failing node will often show normal operational activity up until the exact moment of the crash, followed by silence.

2. Check vmoutput.log and hs_err:

For clusters running on Azul Zing, the most critical evidence resides in the JVM-level output files rather than the application logs.
  • vmoutput.log: This file captures internal JVM events, including deoptimization tasks, compiler activity, and fatal errors that occur before the process exits.  Search for "jvm_exception_handler" or "made not entrant" entries near the timestamp of the node departure. These indicate internal JVM state changes that may have preceded a crash.
  • hs_err_pid<PID>.log: If the JVM suffers a fatal crash (e.g., a Segmentation Fault), it generates a "HotSpot Error" file.

 

Resolution

Above mentioned information indicates that this node departure is not a VMware Tanzu GemFire-level failure, but rather a low-level process termination within the Azul Zing JVM.

Because the GemFire server logs show no errors prior to the departure, the investigation must shift to the JVM diagnostic files:

  • Root Cause: The presence of a segmentation fault during internal JVM optimization tasks

  • Next Steps: This issue requires Azul Support. They will need to analyze the specific deoptimization or compilation events to identify why the JVM process crashed.