Analyze_session is a sub-tool packaged with gpmt . Given a GPDB session ID, this utility will scan all segments and look for busy processes associated with that session. Analyze session will then proceed to start tracing the identified processes by running pstack, lsof, gcore/packcore, and strace. All the collected data will be put into a single tarfile and should be given to remote support.
Requirements and Caveats
Usage: gpmt analyze_session [ -session SESSION_ID ] [ -master-dir DIRECTORY ] [ -segment-dir DIRECTORY ] Description: Given GPDB session id analyze_session will look for non-idle gpdb processes and collect diagnostic information. The information collected will be used remotely for root cause analysis. The most common use case for this tool is when a GPDB query process appears hung. OPTIONS: -session GPDB session id which is referenced in pg_stat_activity -master-dir Working directory for master process -segment-dir Working directory for segment processes -free-space default=10 free space threshold which will abort data collection if reached -a Answer yes to all prompts EXAMPLES: Collect Process information for a given GPDB session id gpmt analyze_session -session 12345 Disable prompts gpmt analyze_session -session 12345 -a Change master directory gpmt analyze_session -session 12345 -master-dir /data Change segment directory gpmt analyze_session -session 12345 -segment-dir /data1
Simulating a use case
create table Hung_analyzer_UnitTest_12345 as select generate_series(1,999999); select count(*) from Hung_analyzer_UnitTest_12345 a,Hung_analyzer_UnitTest_12345 b, Hung_analyzer_UnitTest_12345 c; drop table hung_analyzer_unittest_12345;
gpadmin=# select sess_id, current_query from pg_stat_activity ; sess_id | current_query ---------+--------------------------------------------------------------------------------------------------------------------- 7 | select count(*) from Hung_analyzer_UnitTest_12345 a,Hung_analyzer_UnitTest_12345 b, Hung_analyzer_UnitTest_12345 c; 21 | select sess_id, current_query from pg_stat_activity ; (2 rows)
[gpadmin@mdw demo]$ gpmt analyze_session -session 7
The following will be collected:
mdw: 3098 postgres: port 5432, gpadmin gpadmin 192.168.64.100(29054) con7 192.168.64.100(29054) cmd28 SELECT sdw1.gpmt.local: 2881 postgres: port 40000, gpadmin gpadmin 192.168.64.100(50187) con7 seg0 cmd29 slice3 MPPEXEC SELECT sdw1.gpmt.local: 2883 postgres: port 40001, gpadmin gpadmin 192.168.64.100(19324) con7 seg1 cmd29 slice3 MPPEXEC SELECT sdw1.gpmt.local: 6877 postgres: port 40000, gpadmin gpadmin 192.168.64.100(51067) con7 seg0 cmd29 slice1 MPPEXEC SELECT sdw1.gpmt.local: 6879 postgres: port 40001, gpadmin gpadmin 192.168.64.100(20204) con7 seg1 cmd29 slice1 MPPEXEC SELECT sdw2.gpmt.local: 2881 postgres: port 40000, gpadmin gpadmin 192.168.64.100(33582) con7 seg2 cmd29 slice3 MPPEXEC SELECT sdw2.gpmt.local: 2883 postgres: port 40001, gpadmin gpadmin 192.168.64.100(44039) con7 seg3 cmd29 slice3 MPPEXEC SELECT sdw2.gpmt.local: 6793 postgres: port 40000, gpadmin gpadmin 192.168.64.100(34462) con7 seg2 cmd29 slice1 MPPEXEC SELECT sdw2.gpmt.local: 6795 postgres: port 40001, gpadmin gpadmin 192.168.64.100(44919) con7 seg3 cmd29 slice1 MPPEXEC SELECT Proceed with collection? Yy|Nn (default=Y): y Proceed with collection? Yy|Nn (default=Y): y Starting to collect segment process information Estimated Duration is 180 seconds |==================> |[%36]
Options
Isolate Specific segment hosts
gpmt has a global option "-hostfile" that can be used to isolate which segment servers analyse_session should search. This can help in situations where there is a specific set of servers currently suspected or in cases where we don't need data from all nodes and a collection on a single node is more than adequate.
Process in IO wait or defunct
If analyze_session determines the process is stuck in IO wait or is defunct, then the collection will immediately stop and processes information will be presented to the user. Processes stuck in either of these states immediately suggest the root cause for the hung session.