Moving a segment to a new or spare host using gprecoverseg is failing in Greenplum 6.x
search cancel

Moving a segment to a new or spare host using gprecoverseg is failing in Greenplum 6.x

book

Article ID: 296625

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

Moving a segment to a new host using gprecoverseg fails with:
 dbid | content | role | preferred_role | mode | status | port  | hostname | address |                 datadir                 
------+---------+------+----------------+------+--------+-------+----------+---------+-----------------------------------------
    1 |      -1 | p    | p              | n    | u      |  3001 | gpdb2-m  | gpdb2-m | /data/master/gp_6.11.1_20201013214835-1
    4 |       2 | p    | p              | s    | u      | 30002 | gpdb2-2  | gpdb2-2 | /data/primary/gp_6.11.1_202010132148352
    8 |       2 | m    | m              | s    | u      | 35002 | gpdb2-1  | gpdb2-1 | /data/mirror/gp_6.11.1_202010132148352
    3 |       1 | p    | p              | s    | u      | 30003 | gpdb2-1  | gpdb2-1 | /data/primary/gp_6.11.1_202010132148351
    7 |       1 | m    | m              | s    | u      | 35003 | gpdb2-2  | gpdb2-2 | /data/mirror/gp_6.11.1_202010132148351
    5 |       3 | p    | p              | s    | u      | 30003 | gpdb2-2  | gpdb2-2 | /data/primary/gp_6.11.1_202010132148353
    9 |       3 | m    | m              | s    | u      | 35003 | gpdb2-1  | gpdb2-1 | /data/mirror/gp_6.11.1_202010132148353
    2 |       0 | p    | p              | s    | u      | 30002 | gpdb2-1  | gpdb2-1 | /data/primary/gp_6.11.1_202010132148350
    6 |       0 | m    | m              | s    | u      | 35002 | gpdb2-2  | gpdb2-2 | /data/mirror/gp_6.11.1_202010132148350
(9 rows)

[gpadmin@gpdb2-1 ~]$ pg_ctl stop -D /data/mirror/gp_6.11.1_202010132148352 -m fast
waiting for server to shut down.... done
server stopped

[gpadmin@gpdb2-m ~]$ gpstate -e
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[INFO]:-Starting gpstate with args: -e
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.11.1 build commit:df5f06d6fecffb4de64ab4ed2a1deb3a45efa37c'
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.11.1 build commit:df5f06d6fecffb4de64ab4ed2a1deb3a45efa37c) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Sep 17 2020 03:08:40'
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[INFO]:-Obtaining Segment details from master...
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[INFO]:-Gathering data from segments...
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[WARNING]:-pg_stat_replication shows no standby connections
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[INFO]:-----------------------------------------------------
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[INFO]:-Segment Mirroring Status Report
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[INFO]:-----------------------------------------------------
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[INFO]:-Unsynchronized Segment Pairs
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[INFO]:-   Current Primary   Port    Mirror    Port
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[INFO]:-   gpdb2-2           30002   gpdb2-1   35002
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[INFO]:-----------------------------------------------------
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[INFO]:-Downed Segments (may include segments where status could not be retrieved)
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[INFO]:-   Segment   Port    Config status   Status
20201013:21:54:40:032459 gpstate:gpdb2-m:gpadmin-[INFO]:-   gpdb2-1   35002   Down            Down in configuration

[gpadmin@gpdb2-m ~]$ gprecoverseg -o segments_to_recover -p gpdb2-3 
20201013:21:55:10:032520 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-Starting gprecoverseg with args: -o segments_to_recover -p gpdb2-3
20201013:21:55:10:032520 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.11.1 build commit:df5f06d6fecffb4de64ab4ed2a1deb3a45efa37c'
20201013:21:55:10:032520 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.11.1 build commit:df5f06d6fecffb4de64ab4ed2a1deb3a45efa37c) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Sep 17 2020 03:08:40'
20201013:21:55:10:032520 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-Obtaining Segment details from master...
20201013:21:55:10:032520 gprecoverseg:gpdb2-m:gpadmin-[WARNING]:-Failed to resolve hostname for gpdb2-3-1
20201013:21:55:10:032520 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-Configuration file output to segments_to_recover successfully.

[gpadmin@gpdb2-m ~]$ gprecoverseg -i segments_to_recover -F
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-Starting gprecoverseg with args: -i segments_to_recover -F
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.11.1 build commit:df5f06d6fecffb4de64ab4ed2a1deb3a45efa37c'
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.24 (Greenplum Database 6.11.1 build commit:df5f06d6fecffb4de64ab4ed2a1deb3a45efa37c) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Sep 17 2020 03:08:40'
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-Obtaining Segment details from master...
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-Heap checksum setting is consistent between master and the segments that are candidates for recoverseg
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-Greenplum instance recovery parameters
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:----------------------------------------------------------
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-Recovery from configuration -i option supplied
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:----------------------------------------------------------
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-Recovery 1 of 1
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:----------------------------------------------------------
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-   Synchronization mode                 = Full
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-   Failed instance host                 = gpdb2-1
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-   Failed instance address              = gpdb2-1
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-   Failed instance directory            = /data/mirror/gp_6.11.1_202010132148352
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-   Failed instance port                 = 35002
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-   Recovery Source instance host        = gpdb2-2
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-   Recovery Source instance address     = gpdb2-2
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-   Recovery Source instance directory   = /data/primary/gp_6.11.1_202010132148352
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-   Recovery Source instance port        = 30002
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-   Recovery Target instance host        = gpdb2-3
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-   Recovery Target instance address     = gpdb2-3
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-   Recovery Target instance directory   = /data/mirror/gp_6.11.1_202010132148352
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-   Recovery Target instance port        = 30002
20201013:22:03:12:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:----------------------------------------------------------

Continue with segment recovery procedure Yy|Nn (default=N):
> y
20201013:22:03:14:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-Syncing Greenplum Database extensions
20201013:22:03:15:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-The packages on gpdb2-3 are consistent.
20201013:22:03:15:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-1 segment(s) to recover
20201013:22:03:15:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-Ensuring 1 failed segment(s) are stopped
20201013:22:03:16:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-Ensuring that shared memory is cleaned up for stopped segments
20201013:22:03:16:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-Validating remote directories
20201013:22:03:17:032625 gprecoverseg:gpdb2-m:gpadmin-[INFO]:-Configuring new segments
gpdb2-3 (dbid 8): 
20201013:22:03:17:032625 gprecoverseg:gpdb2-m:gpadmin-[CRITICAL]:-Error occurred: Error Executing Command: 
 Command was: 'ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=60 gpdb2-3 ". /usr/local/greenplum-db-6.11.1/greenplum_path.sh; $GPHOME/bin/lib/gpconfigurenewsegment -c \"/data/mirror/gp_6.11.1_202010132148352:30002:false:false:8:2:gpdb2-2:30002:/home/gpadmin/gpAdminLogs/pg_basebackup.20201013_220316.dbid8.out\" -l /home/gpadmin/gpAdminLogs -n -B 16 --force-overwrite"'
rc=1, stdout='20201013:22:03:17:021577 gpconfigurenewsegment:gpdb2-3:gpadmin-[INFO]:-Starting gpconfigurenewsegment with args: -c /data/mirror/gp_6.11.1_202010132148352:30002:false:false:8:2:gpdb2-2:30002:/home/gpadmin/gpAdminLogs/pg_basebackup.20201013_220316.dbid8.out -l /home/gpadmin/gpAdminLogs -n -B 16 --force-overwrite
20201013:22:03:17:021577 gpconfigurenewsegment:gpdb2-3:gpadmin-[INFO]:-Validate data directories for new segment
20201013:22:03:17:021577 gpconfigurenewsegment:gpdb2-3:gpadmin-[INFO]:-Running pg_basebackup with progress output temporarily in /home/gpadmin/gpAdminLogs/pg_basebackup.20201013_220316.dbid8.out
20201013:22:03:17:021577 gpconfigurenewsegment:gpdb2-3:gpadmin-[ERROR]:-ExecutionError: 'Error Executing Command: ' occurred.  Details: '/usr/local/greenplum-db-6.11.1/bin/lib/gpconfigurenewsegment -c /data/mirror/gp_6.11.1_202010132148352:30002:false:false:8:2:gpdb2-2:30002:/home/gpadmin/gpAdminLogs/pg_basebackup.20201013_220316.dbid8.out -l /home/gpadmin/gpAdminLogs -n -B 16 --force-overwrite'  cmd had rc=1 completed=True halted=False
  stdout=''
  stderr='ExecutionError: 'non-zero rc: 1' occurred.  Details: 'pg_basebackup -c fast -D /data/mirror/gp_6.11.1_202010132148352 -h gpdb2-2 -p 30002 --slot internal_wal_replication_slot --xlog-method stream --force-overwrite --write-recovery-conf --target-gp-dbid 8 -E ./db_dumps -E ./gpperfmon/data -E ./gpperfmon/logs -E ./promote --progress --verbose > /home/gpadmin/gpAdminLogs/pg_basebackup.20201013_220316.dbid8.out 2>&1'  cmd had rc=1 completed=True halted=False
  stdout=''
  stderr='''
', stderr='ExecutionError: 'Error Executing Command: ' occurred.  Details: '/usr/local/greenplum-db-6.11.1/bin/lib/gpconfigurenewsegment -c /data/mirror/gp_6.11.1_202010132148352:30002:false:false:8:2:gpdb2-2:30002:/home/gpadmin/gpAdminLogs/pg_basebackup.20201013_220316.dbid8.out -l /home/gpadmin/gpAdminLogs -n -B 16 --force-overwrite'  cmd had rc=1 completed=True halted=False
  stdout=''
  stderr='ExecutionError: 'non-zero rc: 1' occurred.  Details: 'pg_basebackup -c fast -D /data/mirror/gp_6.11.1_202010132148352 -h gpdb2-2 -p 30002 --slot internal_wal_replication_slot --xlog-method stream --force-overwrite --write-recovery-conf --target-gp-dbid 8 -E ./db_dumps -E ./gpperfmon/data -E ./gpperfmon/logs -E ./promote --progress --verbose > /home/gpadmin/gpAdminLogs/pg_basebackup.20201013_220316.dbid8.out 2>&1'  cmd had rc=1 completed=True halted=False
  stdout=''
  stderr='''
'

Note: This issue only affects Greenplum Database (GPDB) 6.x.

Environment

Product Version: 6.11

Resolution

Workaround

You have to manually add a replication entry in pg_hba.conf for the new host that you are recovering to in the current primary:

ssh gpdb2-2
vi /data/primary/gp_6.11.1_202010132148352/pg_hba.conf
host  replication gpadmin 192.168.99.122/32 trust

In the master: gpstop -u

In this example, 192.168.99.122 = gpdb2-3 (new/spare host)

Full Recovery to a new or spare host will work after using this workaround.