The backup has been taken on the Cluster-A. Backup needs to be restored on Cluster-B. Both clusters have the same number of segments, but segment map (contents, dbids) are different.
Data in Pivotal Greenplum is stored on segments based on distribution key hash value and segment content ID. For example row with distribution key value "0" will always be stored on content X on the cluster. This is true for all clusters with the same number of segments.
This means that if you have two clusters w the th same number of segments, they will have common data storage segment/content for rows with common distribution key values. Therefore transferring data from segment to segment needs to be done according to the primary segment content ID (mirror follows the primary).
Backup is done segment by segment. The files generated:
Therefore in order to prepare to restore in another cluster, the following actions need to be completed:
lpetrov=# select content from gp_segment_configuration where dbid = 2; content --------- 0 (1 row)
lpetrov=# select dbid from gp_segment_configuration where content = 0 and role = 'p'; content --------- 5 (1 row)
lpetrov=# select hostname, f.fselocation, fs.fsname from gp_segment_configuration s join pg_filespace_entry f on s.dbid = f.fsedbid join pg_filespace fs on fs.oid = f.fsefsoid where dbid = 5 and content = 0 and fsname = 'pg_system'; hostname | fselocation ------------+----------------------------------------- lpetrovmac | /Users/lpetrov/greenplum-db-data/gpseg0 (1 row)
Repeat this for all dump files. Master dump files (master is always dbid=1) do not need to be renamed, just moved under the new master data directory. The above will guarantee that files end up in the proper segments for restoration.
Automation
Pre-Steps
-- On the source database (i.e on the cluster where the backup is taken) get the segment configuration using the below script:
\t \o /tmp/source.txt SELECT dbid,content,hostname FROM gp_segment_configuration WHERE role='p' and content <> -1 ORDER BY content; \o \t
-- On the target database (i.e on the cluster where the restore is to be issued. ) get the segment configuration using the below script:
\t \o /tmp/target.txt SELECT dbid,content,hostname,fselocation FROM gp_segment_configuration c , pg_filespace_entry f , pg_database d, pg_tablespace t WHERE c.role='p' AND c.content <> -1 AND f.fsedbid=c.dbid AND f.fsefsoid=t.spcfsoid AND d.dattablespace=t.oid AND d.datname=current_database() ORDER BY content; \o \t
-- Copy the source segment configuration file to the target host.
Script Creation
Execute the below statement to create a mapping script.
Note- Make necessary changes to the environment variable before execution.
Where
Once necessary changes are made execute the below script on the target host.
Use the appropriate script based on the filesystem where the dump is located.
-- ddboost server
export timestamp_key=<backup dump timestamp eg.s 201508100000> export location=<location of the dump eg.s dd_backup/<YYMMDD> (Make sure not to use "/" at the beginning since ddboost already assumes it)> export prefix=gp_dump_0_ export suffix=.gz paste /tmp/source.txt /tmp/target.txt | while read line do s_dbid=`echo $line| awk '{print $1}'` t_dbid=`echo $line| awk '{print $6}'` s_content=`echo $line| awk '{print $3}'` t_content=`echo $line| awk '{print $8}'` t_hostname=`echo $line| awk '{print $10}'` #t_location=`echo $line| awk '{print $12}'` #if in segment default directory if [ "$s_dbid" == "$t_dbid" ] && [ "$s_content" == "$t_content" ] then echo "# The Source / target contents : " $s_content "/" $t_content " dbids are same " $s_dbid "/" $t_dbid ", skipping .... " else echo "gpddboost --rename --from-file="$location"/"${prefix}${s_dbid}"_"${timestamp_key}${suffix} "--to-file="$location"/"${prefix}${t_dbid}"_"${timestamp_key}${suffix} fi done > /tmp/segment_mapping.sh
-- NFS
export location=<location of the dump eg.s /tmp> export timestamp_key=<backup dump timestamp eg.s 201508100000> export prefix=gp_dump_0_ export suffix=.gz paste /tmp/source.txt /tmp/target.txt | while read line do s_dbid=`echo $line| awk '{print $1}'` t_dbid=`echo $line| awk '{print $6}'` s_content=`echo $line| awk '{print $3}'` t_content=`echo $line| awk '{print $8}'` t_hostname=`echo $line| awk '{print $10}'` #t_location=`echo $line| awk '{print $12}'` #if in segment default directory if [ "$s_dbid" == "$t_dbid" ] && [ "$s_content" == "$t_content" ] then echo "# The Source / target contents : " $s_content "/" $t_content " dbids are same " $s_dbid "/" $t_dbid ", skipping .... " else echo "ssh " $t_hostname " \" mv" $location"/"${prefix}${s_dbid}"_"${timestamp_key}${suffix} $location"/"${prefix}${t_dbid}"_"${timestamp_key}${suffix} "\"" fi done > /tmp/segment_mapping.sh
Execution
Open the file verify if everything is correct , once done execute the script using
/bin/sh /tmp/segment_mapping.sh
Pivotal Greenplum Database (GPDB)