How to collect basic information for VMware GemFire issues
search cancel

How to collect basic information for VMware GemFire issues

book

Article ID: 294112

calendar_today

Updated On:

Products

VMware Tanzu Gemfire Pivotal GemFire VMware Tanzu Greenplum / Gemfire VMware Tanzu Data Suite VMware Tanzu Data Suite VMware Tanzu Data Intelligence

Issue/Introduction

This document contains guidelines for collecting information, such as logs, statistics files, thread dumps, and heap dumps for Tanzu GemFire related issues. Besides providing the artifacts, it is important to provide a timeline or overview of the issue with details on impact and actions taken.

Environment

 

Cause

You can use the gfsh command "export logs" to make it easier to collect artifacts, such as logs and stats, and export from a GemFire cluster as described here in the "export logs" section.

It is recommended to use GemFire Management Console (GMC) to identify the correct incident time period, by searching the GemFire logs. Once the period is identified GMC can be used to export the correct logs and statistics using filtering.

What to collect when

  • For all GemFire issues, support will need logs and statistics from all cluster members (locators and servers) covering the time period when the issue occurred.
  • For issues where members are hung (any time an unscheduled "restart" becomes necessary), support will also need thread dumps from, at a minimum, the members that appear unresponsive. It is important that more than one thread dump is taken on each host.
  • For GC tuning issues, support will also require GC logs.
  • For out-of-memory issues and memory leaks, a heap dump will be required (if this is not feasible, a heap histogram is better than nothing).

Resolution

1. Logs:

  1. Locator and Server

    Copy logs from the location configured by the log-file property in the gemfire.properties file or given as a parameter in your startup script. For example (from the gemfire.properties file):

    log-level=config
    log-file=log/cacheserver1.log
    

    Make sure to provide complete logs, including those that cover the header and startup information. This information is valuable when investigating an issue.

  2. Java Client logs

    Copy any client logs from the location defined in the Java client code or gemfire.properties file. For example (in code):

    ClientCache cache = new ClientCacheFactory()
    .set("name", "CqClient")
    .set("cache-xml-file", "xml/CqClient.xml")
    .set("log-level", "config")
    .set("log-file", "cqclient.log")
    .create();
    
  3. Native Client logs

    Copy logs from the path defined in the log-file property of the gfcpp.properties file or defined in native code. For example, in gfcpp.properties:

    log-level=config
    log-file=log/nativeclient1.log
    
  4. Pulse logs

    See more details from How to Configure GemFire Pulse Logging in an Embedded Mode. Note that Pulse is deprecated in GemFire 10.1 and will be removed in a future release. Use GemFire Management Console in instead.

  5. Security logs

    Copy the logs from the path defined by the security-log-file property of the gemfire.properties or gfsecurity.properties file. For example, gemfire.properties:

    security-log-file=log/locatorsecurity.log
    
  6. GFSH logs

    By default, gfsh session logging is disabled. To enable gfsh logging, you must set the Java system property -Dgfsh.log-level=<desired_log_level> where desired_log level is one of the following values: severe, warning, info, config, fine, finer, finest.
    For example, in Linux:

    $ export JAVA_ARGS=-Dgfsh.log-level=config
    

    Then, start gfsh.

    Copy any logs from the directory in which the gfsh command was run. For example, if the gfsh command was run from /home/user1/GemWorkdir1, the gfsh log would be in a file similar to the following:

    /home/user1/GemWorkdir1/gfsh-2013-12-31_17-36-25.log
    
  7. GC Logs

    GC logging is enabled with startup parameters added to the JVM. The following parameters should be added when enabling GC logging for CMS GC (for JDK11+ with G1GC or ZGC use -Xlog:gc+):

    -XX:+PrintGC (or the alias: -verbose:gc)
    -XX:+PrintGCDetails
    -XX:+PrintGCTimeStamps
    -XX:+PrintAdaptiveSizePolicy
    -XX:+PrintTenuringDistribution
    -Xloggc: [file path and name]
    -XX:+UseGCLogFileRotation
    -XX:NumberOfGCLogFiles=100
    -XX:GCLogFileSize=1m
    

    Use Xloggc to specify the path and file name of the GC log file. Default is standard out. If restarting the cluster make sure to collect gc logs before restarting as the JVM can overwrite needed logs when starting.

    The overhead of GC logging is usually rather small so it is generally recommended to have it enabled. However, it is good to know that it is not needed to decide on this at JVM startup. The JVM has a category of flags called "manageable". For manageable flags, it is possible to change their values at run time. All the GC logging flags that start with "PrintGC" belong to the "manageable" category. Thus, it is possible to activate or deactivate GC logging for a running JVM.

    Manageable flags can be set with the use of a JMX client calling the setVMOption operation of the HotSpotDiagnostic MXBean or using the jinfo tool shipped with the JDK.
     

2. Statistics files:

  1. Locator Statistics files, CacheServer Statistics files

    Copy any stats files from the location defined by the statistic-archive-file property of gemfire.properties. For example, gemfire.properties:

    statistic-sampling-enabled=true
    statistic-archive-file=myStatisticsArchiveFile.gfs
    enable-time-statistics=false
    

    The statistic-sample-rate can be changed from the default sample rate of 1000 milliseconds, but this shouldn't be needed as impact is very small.

    Note: Time statistics should only be enabled in dev and QA environments and not in production as this setting has a relatively large impact on VMware GemFire performance.

    To setup rolling of statistics files use the following parameters:

    archive-disk-space-limit=1000
    archive-file-size-limit=100
    

    This will makes gfs files roll when they reach 100MB and keep the last 10 files.

  2. Java Client Statistics files

    Copy any client-side stats files from the path defined in code or in the gemfire.properties file. For example (in code):

    ClientCache cache = new ClientCacheFactory()
    .set("name", "CqClient")
    .set("cache-xml-file", "xml/CqClient.xml")
    .set("log-level", "config")
    .set("log-file", "cqclient.log")
    .set("statistic-archive-file", "myClientStats.gfs"
    .set("statistic-sampling-enabled", "true")
    .create();
    
  3. Native Client Statistics files

    Copy any stats files from the location defined in the statistic-archive-file property of the gfcpp.properties file or defined in native code. For example, gfcpp.properties:

    statistic-sampling-enabled=true
    statistic-archive-file=myClientStats.gfs
    

     

3. Thread dumps

For some issues, such as hung systems or performance issues, thread dumps from the server or client are essential to analyzing the issue. It is very important that multiple thread dumps are taken periodically (i.e. every few seconds) over a period of time.

Thread dumps can be taken using the following procedure:

  • Step 1. Find out the relevant VMware GemFire process id, i.e:
    $ jps -l
    7904 sun.tools.jps.Jps
    5388 sample.JClient
    
  • Step 2. Generate the thread dump(s).

    On Solaris, Linux, and other Unix platforms, sending a SIGQUIT signal to the VMware GemFire Java process will generate a thread dump(s), i.e.:

    kill -QUIT <pid>
    

    In Windows, you can press the CTRL-Break keys in the command shell where the VMware GemFire Java process was started.

    Alternatively, these tools can be used to generate the thread dump:

    1. jstack command: jstack <pid>
    2. Java VisualVM (jvisualvm)
    3. jconsole

4. Heap dump

For investigating issues, such as an Out-of-Memory issue or Memory Leak, a heap dump will help track down underlying issues.

Generate the heap dump of gemfire process using the following:

  • Step 1. First, identify the specific gemfire process id using a command like jps (as in the procedure for getting thread dumps).
  • Step 2. Generate the heap dump.
    1. Using jmap command
      [JDK_INSTALLATION]/bin/jmap -dump:live,format=b,file=heap.dump.out <pid>
      
    2. Using Java VisualVM (jvisualvm):
      • run [JDK_INSTALLATION]/bin/jvisualvm
      • select the target process and select, [Application] menu-->[Heap Dump], and select the generated heap dump, then choose [Save As] "to local disk."
    3. Getting a Heap Dump Automatically on an "Out Of Memory" error:

      Add the following jvm parameter to the java process before it starts.

      -XX:+HeapDumpOnOutOfMemoryError
      -XX:HeapDumpPath
      

      For example, on Windows:

      JAVA_OPTS=%JAVA_OPTS% "-XX:+HeapDumpOnOutOfMemoryError" "-XX:HeapDumpPath=C:\TEMP"
      

5. Configuration Details

Configuration files for the locators and stats.

6. Scripts

Starting script/command for locators and servers.

 

Deeper/extensive artifacts (OS level commands):

  • dmesg command output - This helps debug GemFire issues by revealing OS-level problems (like OOM kills, network resets, or disk errors) that cause GemFire members to hang, restart, or unexpectedly leave the cluster.

 For example, "sudo dmesg --since "2026-02-05 14:20:00" --until "2026-02-05 14:40:00"

  • journalctl command output - This helps debug GemFire issues by showing persistent kernel events (like OOM kills, network drops, or disk I/O errors) during the exact time a GemFire member became slow, unresponsive, or left the cluster.

For example "sudo journalctl -k --since "2026-02-05 14:20:00" --until "2026-02-05 14:40:00"

  • Command to get details about memory
    • free -h – Shows total, used, free, and available memory to detect overall memory pressure affecting GemFire.
    • vmstat 1 – Displays real-time memory, swap, and CPU activity to spot swapping or stalls.
    • cat /proc/meminfo – Provides detailed kernel memory statistics useful for off-heap and native memory analysis.
    • numactl --hardware – Shows NUMA topology to detect uneven memory access impacting GemFire latency.
    • dmesg -T | grep -i oom – Identifies if the OS OOM killer terminated the GemFire JVM.
    • sysctl vm.swappiness – Indicates how aggressively the OS swaps, which can stall GemFire members.
  • Commands about CPU
    • top – Shows real-time CPU usage to identify overloaded GemFire processes.
    • htop – Visual CPU and thread-level view to spot uneven CPU consumption.
    • mpstat -P ALL 1 – Displays per-CPU utilization to detect imbalances and CPU steal on VMs.
    • vmstat 1 – Reveals CPU run-queue and wait times that cause GemFire heartbeat delays.
    • dmesg -T | grep -i "soft lockup" – Detects kernel CPU stalls that can falsely trigger member failure detection.
  • Commands about Disk/File systems:
    • iostat -xz 1 – Shows disk latency and utilization to diagnose slow persistence or log writes.
    • iotop – Displays processes causing high disk I/O, including GemFire disk stores.
    • df -h – Checks disk space to prevent GemFire failures due to full filesystems.
    • mount – Reveals filesystem types and mount options impacting disk performance.
    • dmesg -T | grep -i "I/O error" – Detects kernel-level disk errors causing data or recovery issues.
  • Commands about networking:
    • ip addr – Shows network interface configuration to validate GemFire bind addresses.
    • ip link – Displays link state changes that cause member flapping.
    • ss -s – Summarizes socket statistics to detect connection pressure or retries.
    • netstat -i – Shows packet errors and drops affecting GemFire messaging.
    • ethtool eth0 – Reveals NIC speed, duplex, and errors impacting cluster communication.
    • sar -n DEV 1 – Monitors live packet rates and drops on network interfaces.
    • dmesg -T | grep -i net – Identifies kernel-reported network or driver issues.
  • About Time and Clock:
    • timedatectl – Shows system time and sync status to prevent clock-skew-related failures.
    • chronyc tracking – Confirms NTP accuracy and drift affecting distributed coordination.
    • dmesg -T | grep -i clock – Detects unstable clocks that can confuse GemFire timing logic.
  • About Process
    • ulimit -a – Shows resource limits that can block thread or socket creation.
    • cat /proc/<pid>/limits – Displays runtime limits applied to the GemFire JVM.
    • lsof | wc -l – Counts open file descriptors to detect exhaustion risks.
  • About VM Hypervisor
    • mpstat | grep steal – Shows CPU steal time indicating VM contention impacting GemFire.
    • dmesg -T | grep -i balloon – Detects memory ballooning that silently reduces JVM memory.
    • dmesg -T | grep -i hypervisor – Confirms hypervisor events affecting performance.





Additional Information