How to restore the tables when the backup was taken with less segments
search cancel

How to restore the tables when the backup was taken with less segments

book

Article ID: 296570

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

From the GPDB official documentation, it is not possible to restore the DB when the segment number does not match:

A backup created with gpbackup can only be restored to a Greenplum Database cluster with the same number of segment instances as the source cluster. If you run gpexpand to add segments to the cluster, backups you made before starting the expand cannot be restored after the expansion has completed.


Environment

Product Version: 5.28

Resolution

NOTE: This process works only if the backup was taken WITHOUT the option "--single-data-file".

For every backup, gprestore has a file called gpbackup_<timestamp>_toc.yaml under the master backup directory:
//master backup directory
[gpadmin@gpdb-sandbox 20220522222949]$ ls -al
total 68
drwxrwxr-x 2 gpadmin gpadmin   171 May 22 22:29 .
drwxrwxr-x 7 gpadmin gpadmin   116 May 22 22:29 ..
-r--r--r-- 1 gpadmin gpadmin   598 May 22 22:29 gpbackup_20220522222949_config.yaml
-r--r--r-- 1 gpadmin gpadmin 54476 May 22 22:29 gpbackup_20220522222949_metadata.sql
-r--r--r-- 1 gpadmin gpadmin  1447 May 22 22:29 gpbackup_20220522222949_report
-r--r--r-- 1 gpadmin gpadmin  3066 May 22 22:29 gpbackup_20220522222949_toc.yaml <-----
Let's use the `sales` table as an example. It is backed up when the segment count is 2. In this yaml, we can get the oid of backup (NOTE: it might not be same OID as in pg_class)
- schema: public
  name: sales
  oid: 27888 <----
  attributestring: (id,date,amt)
  rowscopied: 70000
  partitionroot: ""
For the each segment backup directory, it has file as below:
//seg0
gpbackup_0_20220522222949_27888.gz

//seg1
gpbackup_1_20220522222949_27888.gz
Based on above, the naming convention of the gpbackup file is:
gpbackup_<SEG_ID>_<Time_Stamp>_<Table OID>.gz
To restore a specific table, we need to copy every data file to a gpfdist location on the master host:
$ cp gpbackup_0_20220522222949_27888.gz /tmp/seg0/gpbackup_0_20220522222949_27888.gz
$ cp gpbackup_1_20220522222949_27888.gz /tmp/seg1/gpbackup_1_20220522222949_27888.gz 
$ gpfidst -d /tmp/ -p 8801 & 

//via psql 
$ CREATE EXTERNAL TABLE ext_sales (LIKE sales) 
   LOCATION ('gpfdist://master:8801/seg*/*.gz')
   FORMAT 'csv' (DELIMITER AS ',' NULL AS 'null');
   -- Note replace "master" with the hostname of the master host.
The external table is used to read the dump files and load the data to any table. It does not matter how many files/source segments there are.

For example, if there is some bad data on sales table in the 4 segments in the cluster, it is possible to use the below method to copy the data into a new table, then drop the old table and rename the new table to the original table name:
$ create table sales2 (LIKE sales); 
$ insert into sales2 as select * from ext_sales;
$ drop table sales; 
$ ALTER TABLE sales2 rename to sales;

NOTE: If you are restoring a partition table, you need to copy multiple files for the same segment. However, the idea is the same we need to copy the data file of the table to the gpfdist location.