How to collect basic information for VMware Tanzu™ GemFire® for Kubernetes issues

search cancel

How to collect basic information for VMware Tanzu™ GemFire® for Kubernetes issues

book

Article ID: 294019

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

This document contains guidelines for collecting basic information such as logs, statistics files, thread dumps, and heap dumps for VMware Tanzu™ GemFire® for Kubernetes related issues. Besides providing the artifacts, it is important to provide a timeline or overview of the issue with details on impact and actions taken.

Resolution

What to Collect and When

For all VMware GemFire issues, support will need logs and statistics from all cluster members covering the time period when the issue occurred.
For issues where members are hung, support will also need thread dumps from, at a minimum, the members that appear unresponsive. It is important that more than one thread dump is taken on each host.
For tuning issues, support will also require GC logs.
For out-of-memory issues and memory leaks, a heap dump will be required (if this is not feasible, a heap histogram is better than nothing).
In certain situations logs from GFSH or the web based monitoring tool, Pulse, might be needed.

1. Locator and cache server's logs & statistics:

Using gfsh export logs command to export logs and statistics:

kubectl exec LOCATOR-POD-NAME -n NAMESPACE-NAME  -- gfsh -e "connect" -e "export logs --dir=/data/logsAndStats" -e "quit"

kubectl -n NAMESPACE-NAME cp LOCATOR-POD-NAME:logsAndStats/exportedLogs_xxxxx.zip exportedLogs_xxxxx.zip

For example:

Step 1, run gfsh export logs command from the locator pod:

$ kubectl exec gemfire1-locator-0 -n gemfire-cluster -- gfsh -e "connect" -e "export logs --dir=/data/logsAndStats" -e "quit"

(1) Executing - connect
Connecting to Locator at [host=localhost, port=10334] ..
Connecting to Manager at [host=gemfire1-locator-0.gemfire1-locator.gemfire-cluster.svc.cluster.local, port=1099] ..
Successfully connected to: [host=gemfire1-locator-0.gemfire1-locator.gemfire-cluster.svc.cluster.local, port=1099]
You are connected to a cluster of version: 1.13.1

(2) Executing - export logs --dir=/data/logsAndStats
Logs exported to the connected member's file system: /data/logsAndStats/exportedLogs_1614154353148.zip

(3) Executing - quit

Step 2, copy the exported logsAndStats.zip file from locator pod to the local machine:

$ kubectl -n gemfire-cluster cp gemfire1-locator-0:logsAndStats/exportedLogs_1614154353148.zip exportedLogs_1614154353148.zip

Copy logs and statistics files from Locator/Cacheserver's pods separately (this method could be adopted in a situation when the GFSH export logs command is not available):

kubectl -n NAMESPACE-NAME cp LOCATOR-POD-NAME:logsAndStats/ LOCAL_DIERECTORY_XX

kubectl -n NAMESPACE-NAME cp CACHESERVER-POD-NAME:logsAndStats/ LOCAL_DIERECTORY_YY

For example:

$ kubectl -n gemfire-cluster get pods
NAME                 READY   STATUS    RESTARTS   AGE
gemfire1-locator-0   1/1     Running   10         78d
gemfire1-server-0    1/1     Running   8          58d

$ kubectl -n gemfire-cluster cp gemfire1-locator-0:logsAndStats/ /Users/userA/Downloads/locatorlogs

$ kubectl -n gemfire-cluster cp gemfire1-server-0:logsAndStats/ /Users/userA/Downloads/server1logs

2. Pulse log:

You can copy the pulse.log from the locator pod directly:

kubectl -n NAMESPACE-NAME cp LOCATOR-POD-NAME:pulse.log LOCAL_DIERECTORY_XX/pulse.log

For example:

$ kubectl -n gemfire-cluster cp gemfire1-locator-0:pulse.log pulse.log

3. Client logs:

You can utilize the kubectl logs command to output the client log:

kubectl -n NAMESPACE-NAME logs CLIENT-POD-NAME >CLIENT_LOG_FILE.log

For example:

$ kubectl -n gemfire-apps logs bike-demo-7c7b9d9fc4-rm75m >client.log

4. GFSH logs:

If you use the local gfsh library, you can set the Java system property -Dgfsh.log-level=<desired_log_level> where desired_log level is one of the following values: severe, warning, info, config, fine, finer, finest.

$ export JAVA_ARGS=-Dgfsh.log-level=info

Start gfsh and run the gfsh commands; then you can find gfsh-0_0.log from the directory where gfsh runs.

If you use locator pod's gfsh library, you can set the Java system property -Dgfsh.log-level=<desired_log_level> where desired_log level is one of the following values: severe, warning, info, config, fine, finer, finest.

$ export JAVA_ARGS=-Dgfsh.log-level=info

Start gfsh and run the gfsh commands. Then, copy the gfsh log from the locator pod.

kubectl exec -it LOCATOR-POD-NAME -n NAMESPACE-NAME -- bash -c  "export JAVA_ARGS=-Dgfsh.log-level=<desired_log_level> ;  gfsh"

kubectl -n NAMESPACE-NAME cp LOCATOR-POD-NAME:gfsh-0_0.log LOCAL_DIERECTORY_XX/gfsh-0_0.log

For example:

$ kubectl exec -it gemfire1-locator-0 -n gemfire-cluster -- bash -c  "export JAVA_ARGS=-Dgfsh.log-level=info ;  gfsh"

$ kubectl -n NAMESPACE-NAME cp LOCATOR-POD-NAME:gfsh-0_0.log LOCAL_DIERECTORY_XX/gfsh-0_0.log

5. GC Logs:

GC logging is enabled with startup parameters added to the JVM by default in VMware Tanzu™ GemFire® for Kubernetes.

-Xlog:gc+age*=trace,safepoint:file=/data/logsAndStats/CLUSTER-NAME-[LOCATOR or CACHESERVER]-XX-gc.txt:time,uptime:filecount=10,filesize=1M
-verbose:gc

You should have the gc log file (CLUSTER-NAME-[LOCATOR or CACHESERVER]-XX-gc.txt) when you copy logs statistics files from Locator/Cacheserver's pods separately since gc log is located in the same folder of Locator/Cacheserver pods - /data/logsAndStats:

kubectl -n NAMESPACE-NAME cp LOCATOR-POD-NAME:logsAndStats/ LOCAL_DIERECTORY_XX

kubectl -n NAMESPACE-NAME cp CACHESERVER-POD-NAME:logsAndStats/ LOCAL_DIERECTORY_YY

6. Thread dumps:

For some issues, such as hung systems or performance issues, thread dumps from the server or client are essential to analyzing the issue. It is very important that multiple thread dumps are taken periodically (i.e. every 10 seconds) over a period of time.

Thread dumps can be taken in the following ways:

Use the gfsh command to collect the whole cluster's thread dump:

kubectl -n NAMESPACE-NAME exec LOCATOR-POD-NAME -- gfsh -e "connect" -e "export stack-traces --file=allmembers_stacktrace_XX.txt" -e "quit"

kubectl -n NAMESPACE-NAME cp LOCATOR-POD-NAME:allmembers_stacktrace_XX.txt allmembers_stacktrace_XX.txt

For example:

$ kubectl -n gemfire-cluster exec gemfire1-locator-0 -- gfsh -e "connect" -e "export stack-traces --file=allmembers_stacktrace_1.txt" -e "quit"

$ kubectl -n gemfire-cluster cp gemfire1-locator-0:allmembers_stacktrace_1.txt allmembers_stacktrace_1.txt

You can also log in to Locator/Cacheserver pod to collect the thread dump separately:

Step 1, log into Locator or Cacheserver pod. For example:

$ kubectl -n gemfire-cluster exec -it gemfire1-server-0 -- bash

Step 2, generate the thread dump for the Locator/Cacheserver process (process id is always 1).

$ jstack 1 > threaddump_1.txt

Step 3, exit the pod and then use the kubectl cp command to copy the threaddump files to the local directory. For example:

$ kubectl -n gemfire-cluster cp gemfire1-server-0:threaddump_1.txt threaddump_1.txt

7. Heap dump:

For investigating issues, such as an out-of-memory issue or memory leaks, a heap dump will help track down the root cause. You can login to the locator and cache server pods to generate the heap dump of the GemFire process using the following commands:

Step 1, login into the locator or cache server pod. For example:

$ kubectl -n gemfire-cluster exec -it gemfire1-server-0 -- bash

Step 2, generate the heap dump for the Locator/Cacheserver process (process id is always 1).

$ jmap -dump:live,format=b,file=heap.dump.out 1

Step 3, exit the pod and then use the kubectl cp command to copy the heap dump file to the local directory. For example:

$ kubectl -n gemfire-cluster cp gemfire1-server-0:heap.dump.out server0_heap.dump.out

Feedback

thumb_up Yes

thumb_down No