The method for running full recovery (
gprecoverseg -F
) in Pivotal Greenplum has changed between version v5.x and v6.x. In Pivotal Greenplum 6.x, you now use
pg_basebackup
to fully restore a down segment rather than using file replication. This change increases the speed of full recovery.
However, if your hostname and address fields are different in
gp_segment_configuration
, full recovery will use the hostname rather than the address to run
pg_basebackup
. This situation is common if you are looking for query traffic to travel through a faster VIP network vs other traffic which would go through an external network. You can observe this behavior in the following log messages:
dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir
------+---------+------+----------------+------+--------+-------+-------------------------------+---------------------------------+--------------------------
1142 | 372 | m | m | n | d | 41000 | sdw1 | sdw1-2 | /data01/mirror/gpseg372
194 | 192 | m | p | n | d | 40000 | sdw1 | sdw1-1 | /data01/primary/gpseg192
(2 rows)
Continue with segment recovery procedure Yy|Nn (default=N):
> y
20191226:16:10:43:159051 gprecoverseg:sdw1:gpadmin-[INFO]:-2 segment(s) to recover
20191226:16:10:43:159051 gprecoverseg:sdw1:gpadmin-[INFO]:-Ensuring 2 failed segment(s) are stopped
20191226:16:10:43:159051 gprecoverseg:sdw1:gpadmin-[INFO]:-Ensuring that shared memory is cleaned up for stopped segments
20191226:16:10:44:159051 gprecoverseg:sdw1:gpadmin-[INFO]:-Validating remote directories
20191226:16:10:44:159051 gprecoverseg:sdw1:gpadmin-[INFO]:-Configuring new segments
sdw1 (dbid 194):
sdw1 (dbid 1142):
20191226:16:10:45:159051 gprecoverseg:sdw1:gpadmin-[CRITICAL]:-Error occurred: Error Executing Command:
Command was: 'ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=60 sdw1 ". /usr/local/greenplum-db/./greenplum_path.sh; $GPHOME/bin/lib/gpconfigurenewsegment -c \"/data01/primary/gpseg192:40000:false:false:194:192:sdw2:41000:/home/gpadmin/gpAdminLogs/pg_basebackup.20191226_161044.dbid194.out,/data01/mirror/gpseg372:41000:false:false:1142:372:sdw3:40000:/home/gpadmin/gpAdminLogs/pg_basebackup.20191226_161044.dbid1142.out\" -l /home/gpadmin/gpAdminLogs -n -B 16 --force-overwrite"'
rc=1, stdout='20191226:16:10:45:159994 gpconfigurenewsegment:sdw1:gpadmin-[INFO]:-Starting gpconfigurenewsegment with args: -c /data01/primary/gpseg192:40000:false:false:194:192:sdw2:41000:/home/gpadmin/gpAdminLogs/pg_basebackup.20191226_161044.dbid194.out,/data01/mirror/gpseg372:41000:false:false:1142:372:sdw3:40000:/home/gpadmin/gpAdminLogs/pg_basebackup.20191226_161044.dbid1142.out -l /home/gpadmin/gpAdminLogs -n -B 16 --force-overwrite
...
ExecutionError: 'Error Executing Command: ' occured. Details: '/usr/local/greenplum-db/./bin/lib/gpconfigurenewsegment -c /data01/primary/gpseg192:40000:false:false:194:192:sdw2:41000:/home/gpadmin/gpAdminLogs/pg_basebackup.20191226_161044.dbid194.out,/data01/mirror/gpseg372:41000:false:false:1142:372sdw3:40000:/home/gpadmin/gpAdminLogs/pg_basebackup.20191226_161044.dbid1142.out -l /home/gpadmin/gpAdminLogs -n -B 16 --force-overwrite' cmd had rc=1 completed=True halted=False
stdout=''
stderr='ExecutionError: 'non-zero rc: 1' occured. Details: 'pg_basebackup -c fast -D /data01/primary/gpseg192 -h sdw2 -p 41000 --slot internal_wal_replication_slot --xlog-method stream --force-overwrite --write-recovery-conf --target-gp-dbid 194 -E ./db_dumps -E ./gpperfmon/data -E ./gpperfmon/logs -E ./promote --progress --verbose > /home/gpadmin/gpAdminLogs/pg_basebackup.20191226_161044.dbid194.out 2>&1' cmd had rc=1 completed=True halted=False
stdout=''
stderr='''
pg_basebackup: could not connect to server: FATAL: no pg_hba.conf entry for replication connection from host "10.130.211.246", user "gpadmin", SSL off
Note: As you can see, the proper pg_hba.conf
entry is not present as gprecoverseg
is looking for the hostname external IP rather than the VIP which is in the address field.