How to perform gpbackup and gprestore with "--resize-cluster" option when backup was taken on local storage
search cancel

How to perform gpbackup and gprestore with "--resize-cluster" option when backup was taken on local storage

book

Article ID: 296950

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

Starting from gpbackup/gprestore 1.30, we have a new option --resize-cluster available for production use(this option came into being since gpbackup/gprestore version 1.26, as a beta version), to allow you restore data to a cluster that has a different number of segments than the cluster from which the data was backed up.

Consider backup and restore in the following way:
 

Backup to storage plugin

This would be the best practice. In general, the --resize-cluster option works best when a --plugin-config is used (s3 or DDBoost). In such a case, gpbackup will organize the data directory on the specified intermediate storage in the following structure:

<backup-dir>/backups/<datestamp>/<timestamp>/<ALL files here>

This makes it very straight forward for the gprestore to work, as all files are in one single directory, each data file is named with the segment id in its naming structure, making it easy for each segment to pick up the required data files.
 

Back up to local storage or NFS

If there is no storage plugin, we will have to utilize --backup-dir, the better alternative would be to backup to a shared mount which is accessible by all of the segments, like a NFS mount. If that is not possible, and customer has to back to the local storage then follow the below instructions.

This article is to discuss how perform the backup/restore with local storage. Note that this may only work when the source cluster segment count is higher than the target cluster.

Environment

Product Version: 6.25

Resolution

Here is an example of backup on 6-segments cluster and restore it to a 4-segments cluster

1) run gpbackup, we would recommend to backup with option --single-data-file + --single-backup-dir, so that gpbackup places all files for a given backup set on a given host in the same directory, rather than in distinct, per-segment subdirectories under the configured backup directory.
For example, files that were previously created in the <backup-dir>/gpseg0/backups/<datestamp>/<timestamp> or <backup-dir>/gpseg1/backups/<datestamp>/<timestamp> directories will now all be created under <backup-dir>/backups/<datestamp>/<timestamp>.

An example backup command:
$ /usr/local/greenplum-db-6.26.2/bin/gpbackup --dbname backuptest --single-data-file --backup-dir /data/backuptest --single-backup-dir

2) Once backup succeed, you will have the following backup file available under each source segment host:
$ gpssh -f hostfile
=> ls -ltrh /data/backuptest/backups/20240402/20240402111439/
[ mdw] total 28K
[ mdw] -r--r--r-- 1 gpadmin gpadmin 4.5K Apr  2 11:14 gpbackup_20240402111439_metadata.sql
[ mdw] -r--r--r-- 1 gpadmin gpadmin 8.7K Apr  2 11:14 gpbackup_20240402111439_toc.yaml
[ mdw] -r--r--r-- 1 gpadmin gpadmin  741 Apr  2 11:14 gpbackup_20240402111439_config.yaml
[ mdw] -r--r--r-- 1 gpadmin gpadmin 1.9K Apr  2 11:14 gpbackup_20240402111439_report
[sdw2] total 168K
[sdw2] -r--r--r-- 1 gpadmin gpadmin  61 Apr  2 11:14 gpbackup_5_20240402111439_toc.yaml
[sdw2] -rw-r--r-- 1 gpadmin gpadmin 50K Apr  2 11:14 gpbackup_5_20240402111439.gz
[sdw2] -r--r--r-- 1 gpadmin gpadmin  61 Apr  2 11:14 gpbackup_2_20240402111439_toc.yaml
[sdw2] -rw-r--r-- 1 gpadmin gpadmin 50K Apr  2 11:14 gpbackup_2_20240402111439.gz
[sdw2] -r--r--r-- 1 gpadmin gpadmin  61 Apr  2 11:14 gpbackup_3_20240402111439_toc.yaml
[sdw2] -rw-r--r-- 1 gpadmin gpadmin 49K Apr  2 11:14 gpbackup_3_20240402111439.gz
[sdw1] total 168K
[sdw1] -r--r--r-- 1 gpadmin gpadmin  61 Apr  2 11:14 gpbackup_4_20240402111439_toc.yaml
[sdw1] -rw-r--r-- 1 gpadmin gpadmin 50K Apr  2 11:14 gpbackup_4_20240402111439.gz
[sdw1] -r--r--r-- 1 gpadmin gpadmin  61 Apr  2 11:14 gpbackup_1_20240402111439_toc.yaml
[sdw1] -rw-r--r-- 1 gpadmin gpadmin 49K Apr  2 11:14 gpbackup_1_20240402111439.gz
[sdw1] -r--r--r-- 1 gpadmin gpadmin  61 Apr  2 11:14 gpbackup_0_20240402111439_toc.yaml
[sdw1] -rw-r--r-- 1 gpadmin gpadmin 50K Apr  2 11:14 gpbackup_0_20240402111439.gz
=> quit

3)Then move those files to the target host, note that you will need to create the same directory <backup-dir>/backups/<datestamp>/<timestamp> that gpadmin user can access, since gprestore fetch the backup file in a round robin way so the placement re-mapping would be the following way:
//The source cluster segment configuration
$ psql -c "select content,hostname from gp_segment_configuration where role='p' and content<>'-1'"
 content | hostname
---------+----------
       3 | sdw2
       2 | sdw2
       1 | sdw1
       0 | sdw1
       5 | sdw2
       4 | sdw1
(6 rows)

//The target cluster segment configuration
$ psql -c "select content,hostname from gp_segment_configuration where role='p' and content<>'-1'"
 content | hostname
---------+----------------------
       1 | sdw1
       0 | sdw1
       2 | sdw2
       3 | sdw2
(4 rows)
The re-mapping would be like(round robin way):
 The new cluster The old cluster

New seg host1

seg0seg0seg4
seg1seg1seg5
New seg host2seg2seg2 
seg3seg3 
 
4) So after the movement we should have the backup files placed as following on the target host:
$ gpssh -f hostfile
=> ls -ltrh /data/backuptest/backups/20240402/20240402111439/
[sdw2] total 112K
[sdw2] -r--r--r-- 1 gpadmin gpadmin  61 Apr  2 11:14 gpbackup_2_20240402111439_toc.yaml
[sdw2] -rw-r--r-- 1 gpadmin gpadmin 50K Apr  2 11:14 gpbackup_2_20240402111439.gz
[sdw2] -r--r--r-- 1 gpadmin gpadmin  61 Apr  2 11:14 gpbackup_3_20240402111439_toc.yaml
[sdw2] -rw-r--r-- 1 gpadmin gpadmin 49K Apr  2 11:14 gpbackup_3_20240402111439.gz
[sdw1] total 224K
[sdw1] -r--r--r-- 1 gpadmin gpadmin  61 Apr  2 11:14 gpbackup_4_20240402111439_toc.yaml
[sdw1] -rw-r--r-- 1 gpadmin gpadmin 50K Apr  2 11:14 gpbackup_4_20240402111439.gz
[sdw1] -r--r--r-- 1 gpadmin gpadmin  61 Apr  2 11:14 gpbackup_1_20240402111439_toc.yaml
[sdw1] -rw-r--r-- 1 gpadmin gpadmin 49K Apr  2 11:14 gpbackup_1_20240402111439.gz
[sdw1] -r--r--r-- 1 gpadmin gpadmin  61 Apr  2 11:14 gpbackup_0_20240402111439_toc.yaml
[sdw1] -rw-r--r-- 1 gpadmin gpadmin 50K Apr  2 11:14 gpbackup_0_20240402111439.gz
[sdw1] -rw-r--r-- 1 gpadmin gpadmin 50K Apr  2 11:29 gpbackup_5_20240402111439.gz
[sdw1] -r--r--r-- 1 gpadmin gpadmin  61 Apr  2 11:29 gpbackup_5_20240402111439_toc.yaml
[ mdw] total 28K
[ mdw] -r--r--r-- 1 gpadmin gpadmin 4.5K Apr  2 11:14 gpbackup_20240402111439_metadata.sql
[ mdw] -r--r--r-- 1 gpadmin gpadmin 8.7K Apr  2 11:14 gpbackup_20240402111439_toc.yaml
[ mdw] -r--r--r-- 1 gpadmin gpadmin  741 Apr  2 11:14 gpbackup_20240402111439_config.yaml
[ mdw] -r--r--r-- 1 gpadmin gpadmin 1.9K Apr  2 11:14 gpbackup_20240402111439_report
=> exit

Note that if the files were not correctly mapping to the corresponding directory you are likely to encounter the following errors:
Error 1:
[CRITICAL]:-Backup directories missing or inaccessible on 1 segment. See /home/gpadmin/gpAdminLogs/gprestore_20240312.log for a complete list of errors.

Error 2:
[DEBUG]:-Expected to find 2 file(s) on segment 3 on host XXXGP3, but found 1 instead.
[DEBUG]:-Expected to find 2 file(s) on segment 4 on host XXXGP3, but found 1 instead.
[DEBUG]:-Expected to find 2 file(s) on segment 5 on host XXXGP3, but found 1 instead.
[CRITICAL]:-Found incorrect number of backup files on 6 segments. See /home/gpadmin/gpAdminLogs/gprestore_20240314.log for a complete list of errors.
github.com/greenplum-db/gp-common-go-libs/cluster.LogFatalClusterError
5) Now we can run a restore on the target cluster with --resize-cluster
$ dropdb backuptest
$ createdb backuptest
$ gprestore --resize-cluster --backup-dir /data/backuptest --timestamp 20240402111439 
20240402:11:33:10 gprestore:gpadmin:support-gpddb6-mdw:026863-[INFO]:-Restore Key = 20240402111439
20240402:11:33:10 gprestore:gpadmin:support-gpddb6-mdw:026863-[INFO]:-Resize restore specified, will restore a backup set from a 6-segment cluster to a 4-segment cluster
20240402:11:33:10 gprestore:gpadmin:support-gpddb6-mdw:026863-[INFO]:-gpbackup version = 1.30.2
20240402:11:33:10 gprestore:gpadmin:support-gpddb6-mdw:026863-[INFO]:-gprestore version = 1.30.2
20240402:11:33:10 gprestore:gpadmin:support-gpddb6-mdw:026863-[INFO]:-Greenplum Database Version = 6.26.2 build commit:609ff2bb9ccb7d393d772b29770e757dbd2ecc79
20240402:11:33:10 gprestore:gpadmin:support-gpddb6-mdw:026863-[INFO]:-Restoring pre-data metadata
Pre-data objects restored:  6 / 6 [=====================================================] 100.00% 0s
20240402:11:33:10 gprestore:gpadmin:support-gpddb6-mdw:026863-[INFO]:-Pre-data metadata restore complete
Tables restored:  1 / 1 [==================================================================] 100.00%
20240402:11:33:11 gprestore:gpadmin:support-gpddb6-mdw:026863-[INFO]:-Data restore complete
20240402:11:33:11 gprestore:gpadmin:support-gpddb6-mdw:026863-[INFO]:-Restoring post-data metadata
20240402:11:33:11 gprestore:gpadmin:support-gpddb6-mdw:026863-[INFO]:-Post-data metadata restore complete
20240402:11:33:11 gprestore:gpadmin:support-gpddb6-mdw:026863-[INFO]:-Found neither /opt/greenplum_6.26.2/bin/gp_email_contacts.yaml nor /home/gpadmin/gp_email_contacts.yaml
20240402:11:33:11 gprestore:gpadmin:support-gpddb6-mdw:026863-[INFO]:-Email containing gprestore report /data/backuptest/backups/20240402/20240402111439/gprestore_20240402111439_20240402113310_report will not be sent
20240402:11:33:11 gprestore:gpadmin:support-gpddb6-mdw:026863-[INFO]:-Restore completed successfully
$

Tips: This method also works fine if you run gprestore with --metadata-only first and --data-only later