Given a case where a user-defined pljava function appears hung or is showing high CPU utilization, this article describes how to debug the postgres-spawned Java Virtual Machine (JVM) process.
Sample environment for testing
In this example, we will use the following code to generate the outputs. So if you would like to follow along on your own, here is how to get your jar file built (this is the source link):
-- sleeps for specified number of seconds CREATE FUNCTION gotosleep(integer) RETURNS integer AS 'io.pivotal.support.plsleep.GoToSleep(java.lang.Integer)' LANGUAGE java; -- spins cpu for specified number of seconds CREATE FUNCTION gospin(integer) RETURNS integer AS 'io.pivotal.support.plsleep.GoSpin(java.lang.Integer)' LANGUAGE java; -- sleeps for specificed number of seconds and then throws an exception CREATE FUNCTION gotosleepfail(integer) RETURNS integer AS 'io.pivotal.support.plsleep.GoToSleepFail(java.lang.Integer)' LANGUAGE java;
Start up a pljava process
gpadmin=# select gotosleep(300);
gpadmin 6622 3702 1 17:16 ? 00:00:00 postgres: port 5432, gpadmin gpadmin 192.168.64.100(8767) con10 192.168.64.100(8767) cmd1 SELECT
pstack command is generally not helpful
pstack will dump all the native threads and their stacks but it will not reveal any relevant information related to the java classes being invoked. The best we can learn from this output is how many threads are running for this pid. Also, the existence of libjvm.so in the stack trace confirms that this is our pljava process:
[root@mdw ~]# pstack 6622 Thread 10 (Thread 0x7f31595e6700 (LWP 6623)): #0 0x00007f31550ed283 in poll () from /lib64/libc.so.6 #1 0x0000000000bbf874 in rxThreadFunc () #2 0x00007f3155b4faa1 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f31550f6aad in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x7f313a42b700 (LWP 6628)): #0 0x00007f3155b53a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f313b7762f8 in os::PlatformEvent::park(long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #2 0x00007f313b738cce in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #3 0x00007f313b7391ae in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #4 0x00007f313b8fc539 in VMThread::loop() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #5 0x00007f313b8fc840 in VMThread::run() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #6 0x00007f313b777538 in java_start(Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #7 0x00007f3155b4faa1 in start_thread () from /lib64/libpthread.so.0 #8 0x00007f31550f6aad in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7f313a32a700 (LWP 6629)): #0 0x00007f3155b5368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f313b7710b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #2 0x00007f313b76591d in ObjectMonitor::wait(long, bool, Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #3 0x00007f313b5d70f8 in JVM_MonitorWait () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #4 0x00007f3131012cb8 in ?? () #5 0x0000000001a87db8 in ?? () #6 0x00007f3100000000 in ?? () #7 0x0000000001a87000 in ?? () #8 0x00007f313a329538 in ?? () #9 0x00000000fae02750 in ?? () #10 0x00007f313a3295a8 in ?? () #11 0x00000000faea3e90 in ?? () #12 0x0000000000000000 in ?? () Thread 7 (Thread 0x7f313a229700 (LWP 6630)): #0 0x00007f3155b5368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f313b7710b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #2 0x00007f313b76591d in ObjectMonitor::wait(long, bool, Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #3 0x00007f313b5d70f8 in JVM_MonitorWait () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #4 0x00007f3131012cb8 in ?? () #5 0x0000000001a8bed8 in ?? () #6 0x00007f3100000000 in ?? () #7 0x0000000001a8b000 in ?? () #8 0x00007f313a228740 in ?? () #9 0x00007f313a228650 in ?? () #10 0x0000000000000000 in ?? () Thread 6 (Thread 0x7f313a128700 (LWP 6631)): #0 0x00007f3155b56b2d in accept () from /lib64/libpthread.so.0 #1 0x00007f313b25fbc1 in LinuxAttachListener::dequeue() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #2 0x00007f313b25fd5b in AttachListener::dequeue() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #3 0x00007f313b25e78d in attach_listener_thread_entry(JavaThread*, Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #4 0x00007f313b8af5ff in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #5 0x00007f313b8af705 in JavaThread::run() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #6 0x00007f313b777538 in java_start(Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #7 0x00007f3155b4faa1 in start_thread () from /lib64/libpthread.so.0 #8 0x00007f31550f6aad in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f313a027700 (LWP 6632)): #0 0x00007f3155b5368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f313b7710b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #2 0x00007f313b738a1f in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #3 0x00007f313b739226 in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #4 0x00007f313b3aa488 in CompileQueue::get() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #5 0x00007f313b3ae57a in CompileBroker::compiler_thread_loop() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #6 0x00007f313b8af5ff in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #7 0x00007f313b8af705 in JavaThread::run() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #8 0x00007f313b777538 in java_start(Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #9 0x00007f3155b4faa1 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f31550f6aad in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f3139f26700 (LWP 6633)): #0 0x00007f3155b5368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f313b7710b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #2 0x00007f313b738a1f in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #3 0x00007f313b739226 in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #4 0x00007f313b3aa488 in CompileQueue::get() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #5 0x00007f313b3ae57a in CompileBroker::compiler_thread_loop() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #6 0x00007f313b8af5ff in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #7 0x00007f313b8af705 in JavaThread::run() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #8 0x00007f313b777538 in java_start(Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #9 0x00007f3155b4faa1 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f31550f6aad in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f3139e25700 (LWP 6634)): #0 0x00007f3155b5368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f313b7710b3 in os::PlatformEvent::park() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #2 0x00007f313b738a1f in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #3 0x00007f313b7391ae in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #4 0x00007f313b806888 in ServiceThread::service_thread_entry(JavaThread*, Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #5 0x00007f313b8af5ff in JavaThread::thread_main_inner() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #6 0x00007f313b8af705 in JavaThread::run() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #7 0x00007f313b777538 in java_start(Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #8 0x00007f3155b4faa1 in start_thread () from /lib64/libpthread.so.0 #9 0x00007f31550f6aad in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f3139d24700 (LWP 6635)): #0 0x00007f3155b53a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f313b7762f8 in os::PlatformEvent::park(long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #2 0x00007f313b738cce in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #3 0x00007f313b7391ae in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #4 0x00007f313b8ab831 in WatcherThread::sleep() const () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #5 0x00007f313b8abb3e in WatcherThread::run() () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #6 0x00007f313b777538 in java_start(Thread*) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #7 0x00007f3155b4faa1 in start_thread () from /lib64/libpthread.so.0 #8 0x00007f31550f6aad in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f315969a720 (LWP 6622)): #0 0x00007f3155b53a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f313b7762f8 in os::PlatformEvent::park(long) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #2 0x00007f313b777044 in os::sleep(Thread*, long, bool) () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #3 0x00007f313b5df612 in JVM_Sleep () from /usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so #4 0x00007f3131012cb8 in ?? () #5 0x0000000001a2f5a8 in ?? () #6 0x00007ffddc4936c0 in ?? () #7 0x00007f3131010280 in ?? () #8 0x00007ffddc4936b8 in ?? () #9 0x0000000000000000 in ?? ()
Running jstack against the GPDB process
Running jstack against the Greenplum process will work because Greenplum is simply launching a JVM using the JNI standard library. So treating this process as a JVM is acceptable because it is one.
[root@mdw ~]# /usr/java/jdk1.7.0_51/bin/jstack -F 6622 Attaching to process ID 6622, please wait... Debugger attached successfully. Server compiler detected. JVM version is 24.51-b03 Deadlock Detection: No deadlocks found. Thread 6631: (state = BLOCKED) Thread 6630: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=135 (Interpreted frame) - java.lang.ref.ReferenceQueue.remove() @bci=2, line=151 (Interpreted frame) - java.lang.ref.Finalizer$FinalizerThread.run() @bci=16, line=189 (Interpreted frame) Thread 6629: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=503 (Interpreted frame) - java.lang.ref.Reference$ReferenceHandler.run() @bci=46, line=133 (Interpreted frame) Thread 6622: (state = BLOCKED) - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) - io.pivotal.support.plsleep.GoToSleep(java.lang.Integer) @bci=9, line=15 (Interpreted frame)
The above thread dump tells us this process is executing line 15 in function io.pivotal.support.plsleep.GoToSleep. Line 15 in the source code is Thread.sleep(dur *1000). Given that this is a simple example, it is fairly obvious that the only thing this function does is sleep, but in normal cases, it may not be obvious what this thread is doing and this thread dump can provide the desired insight.
14 static int GoToSleep(Integer dur) throws java.lang.InterruptedException { 15 Thread.sleep(dur * 1000); 16 return 1; 17 }
Running jstack on GPDB JVM core file
In this case, we just generated the core file using "gcore 6622", but a unexpected core file can be generated for lots of reasons in production. We can open this file using the following jstack command and see the same information as we would when running against a process. We know process 6622 is running function io.pivotal.support.plsleep.GoToSleep at line 15.
/usr/java/jdk1.7.0_51/bin/jstack -m /usr/local/greenplum-db/bin/postgres core.6622 Attaching to core core.6622 from executable /usr/local/greenplum-db/bin/postgres, please wait... Debugger attached successfully. Server compiler detected. JVM version is 24.51-b03 Deadlock Detection: No deadlocks found. ----------------- 6622 ----------------- 0x00007f3155b53a5e __pthread_cond_timedwait + 0x13e 0x00007f313b777044 _ZN2os5sleepEP6Threadlb + 0x254 0x00007f313b5df612 JVM_Sleep + 0x342 0x00007f3131012cb8 * java.lang.Thread.sleep(long) bci:0 (Interpreted frame) 0x00007f3131006058 * io.pivotal.support.plsleep.GoToSleep(java.lang.Integer) bci:9 line:15 (Interpreted frame) 0x00007f31310004e7 0x00007f313b55a465 _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread + 0x365 0x00007f313b558ec8 _ZN9JavaCalls4callEP9JavaValue12methodHandleP17JavaCallArgumentsP6Thread + 0x28 0x00007f313b5920b9 _ZL17jni_invoke_staticP7JNIEnv_P9JavaValueP8_jobject11JNICallTypeP10_jmethodIDP18JNI_ArgumentPusherP6Thread + 0x219 0x00007f313b59a36a jni_CallStaticIntMethodA + 0xda 0x00007f313bde0c5e JNI_callStaticIntMethodA + 0x5e 0x00007f313bdeb822 _int_invoke + 0x12 0x00007f313bddbea2 Function_invoke + 0x282 0x00007f313bdd87d0 internalCallHandler + 0xd0 0x0000000000762362 ExecMakeFunctionResult + 0x702 0x0000000000762b4d ExecEvalFunc + 0xbd 0x000000000075fa42 ExecProject + 0x242 0x000000000078b8f4 ExecResult + 0x1c4 0x000000000075957f ExecProcNode + 0x48f 0x000000000078fdf8 ExecSetParamPlan + 0x178 0x000000000075e9be ExecEvalParam + 0xce 0x000000000075fa42 ExecProject + 0x242 0x000000000078b8f4 ExecResult + 0x1c4 0x000000000075957f ExecProcNode + 0x48f 0x000000000074ea2c ExecutePlan + 0x47c 0x000000000074f238 ExecutorRun + 0x3a8 0x00000000009a06f1 PortalRun + 0x6f1 0x000000000099628c exec_simple_query + 0x62c 0x000000000099a1c6 PostgresMain + 0x28f6 0x00000000008f7eae ServerLoop + 0xa4e 0x00000000008fac30 PostmasterMain + 0xff0 0x00000000007fccaf main + 0x44f
Dump JVM heap memory stats
This is useful in cases where the JVM is stuck in garbage collection and we suspect the JVM heap memory is consistently filling up, causing performance issues. For example, the following output tells us the max heap size is 958MB:
/usr/java/jdk1.7.0_51/bin/jmap -heap 6622 Attaching to process ID 6622, please wait... Debugger attached successfully. Server compiler detected. JVM version is 24.51-b03 using thread-local object allocation. Mark Sweep Compact GC Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 1004535808 (958.0MB) NewSize = 1310720 (1.25MB) MaxNewSize = 17592186044415 MB OldSize = 5439488 (5.1875MB) NewRatio = 2 SurvivorRatio = 8 PermSize = 21757952 (20.75MB) MaxPermSize = 85983232 (82.0MB) G1HeapRegionSize = 0 (0.0MB) Heap Usage: New Generation (Eden + 1 Survivor Space): capacity = 18874368 (18.0MB) used = 2041944 (1.9473495483398438MB) free = 16832424 (16.052650451660156MB) 10.818608601888021% used Eden Space: capacity = 16842752 (16.0625MB) used = 2041944 (1.9473495483398438MB) free = 14800808 (14.115150451660156MB) 12.12357695464494% used From Space: capacity = 2031616 (1.9375MB) used = 0 (0.0MB) free = 2031616 (1.9375MB) 0.0% used To Space: capacity = 2031616 (1.9375MB) used = 0 (0.0MB) free = 2031616 (1.9375MB) 0.0% used tenured generation: capacity = 41877504 (39.9375MB) used = 0 (0.0MB) free = 41877504 (39.9375MB) 0.0% used Perm Generation: capacity = 21757952 (20.75MB) used = 4586048 (4.37359619140625MB) free = 17171904 (16.37640380859375MB) 21.077572006777107% used
Converting GPDB JVM core file to hprof
A developer may want to convert the JVM core file into a Java hprof dump so that they can get more information from the JVM heap that will help them debug various issues. VisualVM is a tool that is commonly used to read these files and requires a GUI so remote debugging of hrpof files are never very useful for support but may be useful for the customer engineering team that authored the Java class in question.
/usr/java/jdk1.7.0_51/bin/jmap -dump:format=b,file=dump.hprof /usr/local/greenplum-db/bin/postgres core.6622
pljava Extension