What to Collect and When
1. Locator and cache server's logs & statistics:
kubectl exec LOCATOR-POD-NAME -n NAMESPACE-NAME -- gfsh -e "connect" -e "export logs --dir=/data/logsAndStats" -e "quit" kubectl -n NAMESPACE-NAME cp LOCATOR-POD-NAME:logsAndStats/exportedLogs_xxxxx.zip exportedLogs_xxxxx.zip
For example:
Step 1, run gfsh export logs command from the locator pod:
$ kubectl exec gemfire1-locator-0 -n gemfire-cluster -- gfsh -e "connect" -e "export logs --dir=/data/logsAndStats" -e "quit" (1) Executing - connect Connecting to Locator at [host=localhost, port=10334] .. Connecting to Manager at [host=gemfire1-locator-0.gemfire1-locator.gemfire-cluster.svc.cluster.local, port=1099] .. Successfully connected to: [host=gemfire1-locator-0.gemfire1-locator.gemfire-cluster.svc.cluster.local, port=1099] You are connected to a cluster of version: 1.13.1 (2) Executing - export logs --dir=/data/logsAndStats Logs exported to the connected member's file system: /data/logsAndStats/exportedLogs_1614154353148.zip (3) Executing - quit
Step 2, copy the exported logsAndStats.zip file from locator pod to the local machine:
$ kubectl -n gemfire-cluster cp gemfire1-locator-0:logsAndStats/exportedLogs_1614154353148.zip exportedLogs_1614154353148.zip
kubectl -n NAMESPACE-NAME cp LOCATOR-POD-NAME:logsAndStats/ LOCAL_DIERECTORY_XX kubectl -n NAMESPACE-NAME cp CACHESERVER-POD-NAME:logsAndStats/ LOCAL_DIERECTORY_YY
For example:
$ kubectl -n gemfire-cluster get pods NAME READY STATUS RESTARTS AGE gemfire1-locator-0 1/1 Running 10 78d gemfire1-server-0 1/1 Running 8 58d $ kubectl -n gemfire-cluster cp gemfire1-locator-0:logsAndStats/ /Users/userA/Downloads/locatorlogs $ kubectl -n gemfire-cluster cp gemfire1-server-0:logsAndStats/ /Users/userA/Downloads/server1logs
2. Pulse log:
You can copy the pulse.log from the locator pod directly:
kubectl -n NAMESPACE-NAME cp LOCATOR-POD-NAME:pulse.log LOCAL_DIERECTORY_XX/pulse.log
For example:
$ kubectl -n gemfire-cluster cp gemfire1-locator-0:pulse.log pulse.log
3. Client logs:
You can utilize the kubectl logs command to output the client log:
kubectl -n NAMESPACE-NAME logs CLIENT-POD-NAME >CLIENT_LOG_FILE.log
For example:
$ kubectl -n gemfire-apps logs bike-demo-7c7b9d9fc4-rm75m >client.log
4. GFSH logs:
$ export JAVA_ARGS=-Dgfsh.log-level=info
Start gfsh and run the gfsh commands; then you can find gfsh-0_0.log from the directory where gfsh runs.
$ export JAVA_ARGS=-Dgfsh.log-level=info
Start gfsh and run the gfsh commands. Then, copy the gfsh log from the locator pod.
kubectl exec -it LOCATOR-POD-NAME -n NAMESPACE-NAME -- bash -c "export JAVA_ARGS=-Dgfsh.log-level=<desired_log_level> ; gfsh" kubectl -n NAMESPACE-NAME cp LOCATOR-POD-NAME:gfsh-0_0.log LOCAL_DIERECTORY_XX/gfsh-0_0.log
For example:
$ kubectl exec -it gemfire1-locator-0 -n gemfire-cluster -- bash -c "export JAVA_ARGS=-Dgfsh.log-level=info ; gfsh" $ kubectl -n NAMESPACE-NAME cp LOCATOR-POD-NAME:gfsh-0_0.log LOCAL_DIERECTORY_XX/gfsh-0_0.log
5. GC Logs:
GC logging is enabled with startup parameters added to the JVM by default in VMware Tanzu™ GemFire® for Kubernetes.
-Xlog:gc+age*=trace,safepoint:file=/data/logsAndStats/CLUSTER-NAME-[LOCATOR or CACHESERVER]-XX-gc.txt:time,uptime:filecount=10,filesize=1M -verbose:gc
You should have the gc log file (CLUSTER-NAME-[LOCATOR or CACHESERVER]-XX-gc.txt) when you copy logs statistics files from Locator/Cacheserver's pods separately since gc log is located in the same folder of Locator/Cacheserver pods - /data/logsAndStats:
kubectl -n NAMESPACE-NAME cp LOCATOR-POD-NAME:logsAndStats/ LOCAL_DIERECTORY_XX kubectl -n NAMESPACE-NAME cp CACHESERVER-POD-NAME:logsAndStats/ LOCAL_DIERECTORY_YY
6. Thread dumps:
For some issues, such as hung systems or performance issues, thread dumps from the server or client are essential to analyzing the issue. It is very important that multiple thread dumps are taken periodically (i.e. every 10 seconds) over a period of time.
Thread dumps can be taken in the following ways:
kubectl -n NAMESPACE-NAME exec LOCATOR-POD-NAME -- gfsh -e "connect" -e "export stack-traces --file=allmembers_stacktrace_XX.txt" -e "quit" kubectl -n NAMESPACE-NAME cp LOCATOR-POD-NAME:allmembers_stacktrace_XX.txt allmembers_stacktrace_XX.txt
For example:
$ kubectl -n gemfire-cluster exec gemfire1-locator-0 -- gfsh -e "connect" -e "export stack-traces --file=allmembers_stacktrace_1.txt" -e "quit" $ kubectl -n gemfire-cluster cp gemfire1-locator-0:allmembers_stacktrace_1.txt allmembers_stacktrace_1.txt
Step 1, log into Locator or Cacheserver pod. For example:
$ kubectl -n gemfire-cluster exec -it gemfire1-server-0 -- bash
Step 2, generate the thread dump for the Locator/Cacheserver process (process id is always 1).
$ jstack 1 > threaddump_1.txt
Step 3, exit the pod and then use the kubectl cp command to copy the threaddump files to the local directory. For example:
$ kubectl -n gemfire-cluster cp gemfire1-server-0:threaddump_1.txt threaddump_1.txt
7. Heap dump:
For investigating issues, such as an out-of-memory issue or memory leaks, a heap dump will help track down the root cause. You can login to the locator and cache server pods to generate the heap dump of the GemFire process using the following commands:
Step 1, login into the locator or cache server pod. For example:
$ kubectl -n gemfire-cluster exec -it gemfire1-server-0 -- bash
Step 2, generate the heap dump for the Locator/Cacheserver process (process id is always 1).
$ jmap -dump:live,format=b,file=heap.dump.out 1
Step 3, exit the pod and then use the kubectl cp command to copy the heap dump file to the local directory. For example:
$ kubectl -n gemfire-cluster cp gemfire1-server-0:heap.dump.out server0_heap.dump.out