How to Restore a Backup on a System with the Different Segment Map (contents, dbids)

Products

VMware Tanzu Greenplum

Issue/Introduction

The backup has been taken on the Cluster-A. Backup needs to be restored on Cluster-B. Both clusters have the same number of segments, but segment map (contents, dbids) are different.

Environment

Resolution

Data in Pivotal Greenplum is stored on segments based on distribution key hash value and segment content ID. For example row with distribution key value "0" will always be stored on content X on the cluster. This is true for all clusters with the same number of segments.

This means that if you have two clusters w the th same number of segments, they will have common data storage segment/content for rows with common distribution key values. Therefore transferring data from segment to segment needs to be done according to the primary segment content ID (mirror follows the primary).

Backup is done segment by segment. The files generated:

are generated under the specific segment data directory
include the segment DBID (but not the content ID) - example: gp_dump_0_2_20150612091645.gz (where "0" identifies 'segment backup file' (as opposed to "1" - master backup file) and "2" identifies the segment dbid)

Therefore in order to prepare to restore in another cluster, the following actions need to be completed:

identify the content id for the specific dbid in the source cluster (Note: primary segment dbid when backup was taken - this is the dbid in the backup file was generated)
```
lpetrov=# select content from gp_segment_configuration where dbid = 2;
 content 
---------
       0
(1 row) 
```

Find the corresponding primary segment dbid in the new cluster

lpetrov=# select dbid from gp_segment_configuration where content = 0 and role = 'p';
 content 
---------
       5
(1 row)

Find the segment data directory location in the new cluster for that dbid (content should match)

lpetrov=# select
    hostname,
    f.fselocation,
    fs.fsname
from
    gp_segment_configuration s
    join pg_filespace_entry f on s.dbid = f.fsedbid
    join pg_filespace fs on fs.oid = f.fsefsoid
where
    dbid = 5
    and content = 0
    and fsname = 'pg_system';

  hostname  |               fselocation               
------------+-----------------------------------------
 lpetrovmac | /Users/lpetrov/greenplum-db-data/gpseg0 
(1 row)

Move the file ("gp_dump_0_2_20150612091645.gz") under the new segment (lpetrovmac:/Users/lpetrov/greenplum-db-data/gpseg5, under the db_dumps/<date>/ directory) with changed name (dbid 2 -> dbid 5) "gp_dump_0_5_20150612091645.gz"

Repeat this for all dump files. Master dump files (master is always dbid=1) do not need to be renamed, just moved under the new master data directory. The above will guarantee that files end up in the proper segments for restoration.

Automation

Pre-Steps

-- On the source database (i.e on the cluster where the backup is taken) get the segment configuration using the below script:

\t
\o /tmp/source.txt
SELECT dbid,content,hostname 
FROM gp_segment_configuration 
WHERE role='p' and content <> -1
ORDER BY content;
\o
\t

-- On the target database (i.e on the cluster where the restore is to be issued. ) get the segment configuration using the below script:

\t
\o /tmp/target.txt
SELECT dbid,content,hostname,fselocation
FROM gp_segment_configuration c , pg_filespace_entry f , pg_database d, pg_tablespace t
WHERE c.role='p' 
AND c.content <> -1
AND f.fsedbid=c.dbid
AND f.fsefsoid=t.spcfsoid
AND d.dattablespace=t.oid
AND d.datname=current_database()
ORDER BY content;
\o
\t

-- Copy the source segment configuration file to the target host.

Script Creation

Execute the below statement to create a mapping script.

Note- Make necessary changes to the environment variable before execution.

Where

prefix -> Is the start of the dump name (mostly this is constant)
suffix -> If the backup dump is compressed ( gzip'ed ) leave it else comment out the variable
location -> location of the dump on the target server
timestamp_key -> the dump timestamp key that is being attempted to restore

Once necessary changes are made execute the below script on the target host.

Use the appropriate script based on the filesystem where the dump is located.

-- ddboost server

export timestamp_key=<backup dump timestamp eg.s 201508100000>
export location=<location of the dump eg.s dd_backup/<YYMMDD> (Make sure not to use "/" at the beginning since ddboost already assumes it)>
export prefix=gp_dump_0_
export suffix=.gz

paste /tmp/source.txt /tmp/target.txt | while read line
do
s_dbid=`echo $line| awk '{print $1}'` 
t_dbid=`echo $line| awk '{print $6}'`
s_content=`echo $line| awk '{print $3}'` 
t_content=`echo $line| awk '{print $8}'` 
t_hostname=`echo $line| awk '{print $10}'`
#t_location=`echo $line| awk '{print $12}'` #if in segment default directory

if [ "$s_dbid" == "$t_dbid" ] && [ "$s_content" == "$t_content" ] 
then 
echo "# The Source / target contents : " $s_content "/" $t_content " dbids are same " $s_dbid "/" $t_dbid ", skipping .... " 
else 
echo "gpddboost --rename --from-file="$location"/"${prefix}${s_dbid}"_"${timestamp_key}${suffix} "--to-file="$location"/"${prefix}${t_dbid}"_"${timestamp_key}${suffix} 
fi 
done > /tmp/segment_mapping.sh

-- NFS

export location=<location of the dump eg.s /tmp>
export timestamp_key=<backup dump timestamp eg.s 201508100000>
export prefix=gp_dump_0_
export suffix=.gz

paste /tmp/source.txt /tmp/target.txt | while read line
do
s_dbid=`echo $line| awk '{print $1}'` 
t_dbid=`echo $line| awk '{print $6}'`
s_content=`echo $line| awk '{print $3}'` 
t_content=`echo $line| awk '{print $8}'` 
t_hostname=`echo $line| awk '{print $10}'`
#t_location=`echo $line| awk '{print $12}'` #if in segment default directory

if [ "$s_dbid" == "$t_dbid" ] && [ "$s_content" == "$t_content" ] 
then 
echo "# The Source / target contents : " $s_content "/" $t_content " dbids are same " $s_dbid "/" $t_dbid ", skipping .... " 
else 
echo "ssh " $t_hostname " \" mv" $location"/"${prefix}${s_dbid}"_"${timestamp_key}${suffix} $location"/"${prefix}${t_dbid}"_"${timestamp_key}${suffix} "\""
fi 
done > /tmp/segment_mapping.sh

Execution

Open the file verify if everything is correct , once done execute the script using

/bin/sh /tmp/segment_mapping.sh

Additional Information

+ Environment:

Pivotal Greenplum Database (GPDB)