Script for detecting orphan process and locks in Greenplum database
search cancel

Script for detecting orphan process and locks in Greenplum database

book

Article ID: 295285

calendar_today

Updated On:

Products

VMware Tanzu Greenplum VMware Tanzu Greenplum / Gemfire VMware Tanzu Data Suite VMware Tanzu Data Suite

Issue/Introduction

This article pertains to all versions of Greenplum and covers how to detect orphan processes in the database cluster.

An orphan process is a process (related to a gang) that is running on the segments and there is no corresponding process related to the same gang on the master or other segments.

This orphan process may happen when there is query abort, process crash, server/segment crash, etc.

Resolution

Note: The queries or commands below might produce a misleading message. When the query is run, this query might be in the dispatch phase and not all segments have started the transaction. Therefore, it's always a good idea to run the query again after two minutes to ensure the process list remains the same or changed.

Detect all the orphan process through the Unix utility

Note:

  • The script below might not help if the orphan process is on the master server itself. In other words, the query is still running on the master but no connection on segments.
  • Ignore the process that is IDLE.
  • The script below uses the connections on master as a reference. In case the connection exists between the master and a few segments but not the rest, the script below might not report it.
#!/bin/sh

# Get the current list of connections from the DB
CONLIST=$(psql -R '|' -d template1 -Atc "SELECT DISTINCT 'con'||sess_id||' ' FROM pg_stat_activity")

HOSTLIST=$(psql -R ' ' -d template1 -Atc "SELECT DISTINCT hostname from gp_segment_configuration;")

for H in $HOSTLIST ; do
   echo "Checking host ${H}..."
   ssh $H ps auxww | egrep postgres | egrep 'con[0-9]' | egrep -v 'xxxx|'"${CONLIST}"
   echo 
done


Check from the database

Note: This works if the process is holding any lock on the table and its detected by pg_locks.

From the report below, total segments should be equal to total sessions. If this is not the case, then the session is orphan or missing one or more segments.

Greenplum 5.x:

select
   'con'|| a.mppsessionid AS "Session ID",
   b.total_seg as "Total Segments",
   count(a.*) AS "Total Sessions"
from
  (select distinct mppsessionid,gp_segment_id 
   from pg_locks
   where mppsessionid not in (select sess_id from pg_stat_activity where procpid!=pg_backend_pid() OR current_query!='<IDLE>' OR waiting='t')
   and mppsessionid != 0
  ) a,
  (select count(*) as total_seg
   from gp_segment_configuration 
   where role='p'
  ) b
group by 1,2
having count(a.*) < b.total_seg
order by 3;

Greenplum 6.x:

select
   'con'|| a.mppsessionid AS "Session ID",
   b.total_seg as "Total Segments",
   count(a.*) AS "Total Sessions"
from
  (select distinct mppsessionid,gp_segment_id 
   from pg_locks
   where mppsessionid not in (select sess_id from pg_stat_activity where pid!=pg_backend_pid() OR query!='<IDLE>' OR waiting='t')
   and mppsessionid != 0
  ) a,
  (select count(*) as total_seg
   from gp_segment_configuration 
   where role='p'
  ) b
group by 1,2
having count(a.*) < b.total_seg
order by 3;

Greenplum 7.x

SELECT
   'con'|| a.mppsessionid AS "Session ID",
   b.total_seg AS "Total Segments",
   count(a.*) AS "Total Sessions"
FROM
   (SELECT DISTINCT mppsessionid, pg_locks."gp_segment_id" FROM pg_locks) a,
   (SELECT count(*) AS total_seg FROM gp_segment_configuration WHERE role='p') b
GROUP BY 1,2
HAVING count(a.*) < b.total_seg
ORDER BY 3;

Check if the orphan process is blocking any session/query

After running the query below, note the Blockers SessionID and match it with the content <sessionID> in the query above. If it's the same then this orphan process is blocking other running queries.

SELECT
  w.relation::regclass AS "Table",
  w.mode        AS "Waiters Mode",
  w.pid         AS "Waiters PID",
  w.mppsessionid    AS "Waiters SessionID",
  b.mode        AS "Blockers Mode",
  b.pid         AS "Blockers PID",
  b.mppsessionid    AS "Blockers SessionID",
  (select 'Hostname: ' || c.hostname ||' Content: '|| c.content || ' Port: ' || port from gp_segment_configuration c where c.content=b.gp_segment_id and role='p') AS "Blocking Segment"
FROM pg_catalog.pg_locks AS w, pg_catalog.pg_locks AS b
Where ((w."database" = b."database" AND w.relation = b.relation)
OR w.transactionid = b.transactionid)
AND w.granted='f'
AND b.granted='t'
AND w.mppsessionid <> b.mppsessionid
AND w.mppsessionid in (SELECT l.mppsessionid FROM pg_locks l WHERE l.granted = true AND relation in ( select relation from pg_locks where granted='f'))
AND w.gp_segment_id = b.gp_segment_id
ORDER BY 1;

For more information on the database locks, refer to the scripts discussed How to check for orphan connections in Tanzu Greenplum.

For few more scripts to check the orphan process, refer to Locks - How to detect blocking and waiting queries on Tanzu Greenplum

 

Additional Information

  • If there is a backup or restore (especially gpdump/gprestore) running, then take off the parallel feature. Each segment connects independently to its segments and issue COPY FROM/TO, so they are not orphan processes.
  • If the orphan process that is detected on master is IDLE and there is no corresponding process on any segment for that con<sessionID>, it is perfectly fine. The segments IDLE process is terminated by the Pivotal Greenplum based on the parameter gp_vmem_idle_resource_timeout.