Steps to take when gprecoverseg utility is terminated after ssh connection to master is stopped for Tanzu Greenplum

Products

VMware Tanzu Greenplum

Issue/Introduction

Steps to check when gprecoverseg utility is terminated on the master host.

Scenario:
Consider a scenario when you are running Incremental recovery (gprecoverseg) or Full recovery (gprecoverseg -F) and you end up losing the connection to your master host. If you are not running the utility in nohup mode, it is possible that the gprecoverseg utility will be terminated and you will not see it running on the master the next time you login (Verify using: ps -ef | grep gprecoverseg)

Confirm if gprecoverseg is not running on the master host. The above below leave behind orphaned sessions (pg_basebackup or pg_rewind) on the segments that should be verified before starting another run of gprecoverseg.

[gpadmin@gpdbnew-m gpAdminLogs]$ ps -ef | grep gprecoverseg
gpadmin  25014 24966  0 16:57 pts/1    00:00:00 grep --color=auto gprecoverseg

Environment

Product Version: 6.20

Resolution

In this case, gpssh to all the nodes and confirm the following:

[gpadmin@gpdbnew-m ~]$ gpssh -f hostfile
=> ps -ef | grep pg_basebackup
[gpdbnew-1] gpadmin  16297 16263  0 17:00 pts/1    00:00:00 grep --color=auto pg_basebackup
[gpdbnew-2] gpadmin  11844 11819  0 17:00 pts/0    00:00:00 grep --color=auto pg_basebackup
[gpdbnew-m] gpadmin  25263 25229  0 17:00 pts/5    00:00:00 grep --color=auto pg_basebackup
=> ps -ef | grep pg_rewind    
[gpdbnew-1] gpadmin  16306 16263  0 17:00 pts/1    00:00:00 grep --color=auto pg_rewind
[gpdbnew-2] gpadmin  11846 11819  0 17:00 pts/0    00:00:00 grep --color=auto pg_rewind
[gpdbnew-m] gpadmin  25272 25229  0 17:00 pts/5    00:00:00 grep --color=auto pg_rewind

Check if there are any processes running pg_basebackup or pg_rewind specifically on the nodes where the segments to be recovered are located. The information about nodes that are participating in the the recovery can be gathered by querying the "gp_segment_configuration" table.

**Note**
If the orphaned pg_basebackup or pg_rewind sessions are not running on the specific nodes (like in the example shown above), please feel free to rerun the gprecoverseg utility again.

If there are orphaned pg_basebackup or pg_rewind sessions running on these nodes, you will see something like the following:

[gpadmin@gpdbnew-1 ~]$ ps -ef | grep pg_basebackup
gpadmin  15425 15424  0 16:51 ?        00:00:00 bash -c . /usr/local/6.19.4/greenplum-db-6.19.4/greenplum_path.sh; /usr/local/6.19.4/greenplum-db-6.19.4/bin/lib/gpconfigurenewsegment -c "/data/primary/gp_6.19.4_202204041858361:30007:false:false:3:1:gpdbnew-2:35007:/home/gpadmin/gpAdminLogs/pg_basebackup.20220422_165117.dbid3.out" -l /home/gpadmin/gpAdminLogs -n -b 64 --validation-only --force-overwrite
gpadmin  15439 15425  0 16:51 ?        00:00:00 python /usr/local/6.19.4/greenplum-db-6.19.4/bin/lib/gpconfigurenewsegment -c /data/primary/gp_6.19.4_202204041858361:30007:false:false:3:1:gpdbnew-2:35007:/home/gpadmin/gpAdminLogs/pg_basebackup.20220422_165117.dbid3.out -l /home/gpadmin/gpAdminLogs -n -b 64 --validation-only --force-overwrite

The above is an example of orphaned pg_basebackup session that was triggered by the gprecoverseg utility when doing full recovery.

Steps to take when there are orphaned 'pg_basebackup' or 'pg_rewind' sessions on the segment nodes:

Terminate the above sessions using a 'kill' on the pid. Please make sure that these sessions correspond to the segment for which full/incremental recovery was triggered and whose gprecoverseg utility was terminated on the master.
Once the 'pg_basebackup' or 'pg_rewind' are terminated on the segment, run a "ps -ef | grep pg_basebackup" or "ps -ef | grep pg_rewind" again to confirm if they are gone on the segment hosts.
If yes, ssh to the master and then run a full recovery for these segments [gprecoverseg -F]. Note that if an incremental recovery was earlier terminated and pg_rewind sessions had to be manually killed on the segments, you will still need to run a full recovery for these segments.