When table distribute key type is ‘real’ gprestore returns error when restoring data in Greenplum
search cancel

When table distribute key type is ‘real’ gprestore returns error when restoring data in Greenplum

book

Article ID: 296963

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

In the log below, the test_real table has a real type distribution:

gpadmin=# create table test_real as SELECT random()::real as real_id from generate_series(1,100);
gpadmin=# \d test_real
  Table "public.test_real"
 Column  | Type | Modifiers
---------+------+-----------
 real_id | real |
Distributed by: (real_id)

 

Backup the table using gpbackup
# gpbackup --dbname gpadmin --include-table public.test_real --backup-dir /tmp/gp4_backup

Navigate to the Greenplum 5.x cluster with the same segment host settings as the segment host in Greenplum cluster 4.x. Attempt to restore the table.
$ gprestore --backup-dir /tmp/gp4_backup --timestamp 20190517101006 --include-table public.test_real

The following error message is returned, specifying an inability to place data into the segment with ID #:
20190517:10:19:27 gprestore:gpadmin:greenplum-aio-02:027790-[INFO]:-Restore Key = 20190517101006
20190517:10:19:27 gprestore:gpadmin:greenplum-aio-02:027790-[INFO]:-Restoring pre-data metadata
Pre-data objects restored:  2 / 2 [=====================================================] 100.00% 0s
20190517:10:19:27 gprestore:gpadmin:greenplum-aio-02:027790-[INFO]:-Pre-data metadata restore complete
Tables restored:  0 / 1 [------------------------------------------------------------------]   0.00%
20190517:10:19:27 gprestore:gpadmin:greenplum-aio-02:027790-[CRITICAL]:-Error loading data into table public.test_real: ERROR: value of distribution key doesn't belong to segment with ID 0, it belongs to segment with ID 1  (seg0 10.34.6.199:20000 pid=27828) (SQLSTATE 22P04)


The real type distribution strategy in Greenplum 4.x is different from the distribution strategy in Greenplum 5.x, causing the error.

When using gprestore, the utility will attempt to place data into the original segment ID based on the backup file name:

### Content 0 ###

[gpadmin@greenplum-aio-02][~]$ ls -l /tmp/gp4_backup/gpdb_4.3.31.0_0/backups/20190517/20190517101006/
total 4
-rw------- 1 gpadmin gpadmin 248 May 17 10:10 gpbackup_0_20190517101006_34939.gz

### Content 1 ###

[gpadmin@greenplum-aio-02][~]$ ls -l /tmp/gp4_backup/gpdb_4.3.31.0_1/backups/20190517/20190517101006/
total 4
-rw------- 1 gpadmin gpadmin 211 May 17 10:10 gpbackup_1_20190517101006_34939.gz

 
Due to the distribution strategy change, data belonging to content#0 in Greenplum 4.x should populate a different area in Greenplum 5.x.
### GP4 ###
gpadmin=# SELECT gp_segment_id,count(*) from gp_dist_random('test_real') group by 1;
 gp_segment_id | count
---------------+-------
             1 |    54
             0 |    46
(2 rows)

### GP5 ###
gpadmin=# SELECT gp_segment_id,count(*) from gp_dist_random('test_real') group by 1;
 gp_segment_id | count
---------------+-------
             0 |    51
             1 |    49


This article details two workarounds for the error message returned by gprestore while restoring data from Greenplum Database (GPDB) 4.3x to GPDB 5.x.

gprestore will return the error message listed below if:

  • A table exists with the distributing key type 'real'
  • A table was backed up using gpbackup in GPDB 4.x
  • A table was restored to GPDB 5.x from  GPDB 4.x with identical segment host settings
Error loading data into table [table_name]: ERROR: value of distribution key doesn't belong to segment with ID 0, it belongs to segment with ID 1

The R&D team has confirmed that the limitations within the backup and restoration functionalities of GPDB 4.x and GPDB 5.x cause this error.

Two workarounds exist to resolve this issue:
  1. Temporarily change the table distribution key table to random
  2. Use gpcopy

Refer to the resolution section for more details.


Environment

Product Version: 4.3.9

Resolution

Workaround 

Two workarounds for this issue are described below.

1. Change the distribution key to random in the original table and use gpbackup or gprestore. Revert the distribution key in Greenplum 5.x.

- On GP4, change the distributing key to random
  # ALTER TABLE test_real SET distributed randomly;

- Run the backup
  # gpbackup --dbname gpadmin --include-table public.test_real --backup-dir /tmp/gp4_backup

- Restore the data into GP5
  $ gprestore --backup-dir /tmp/gp4_backup --timestamp 20190517102336  --include-table public.test_real

- On GP5, change the distributing key back to original one

    gpadmin=# \d test_real
      Table "public.test_real"
     Column  | Type | Modifiers
    ---------+------+-----------
     real_id | real |
    Distributed randomly

    gpadmin=SELECT gp_segment_id, count(*) from gp_dist_random('test_real') group by 1;
    gpadmin=# \d test_real
      Table "public.test_real"
     Column  | Type | Modifiers
    ---------+------+-----------
     real_id | real |
    Distributed by: (real_id)

 

2. Use gpcopy:
$ gpcopy --source-host smdw  --source-port 3733 --source-user gpadmin  --dest-host smdw --dest-port 4987 --dest-user gpadmin --include-table gpadmin.public.test_real --truncate
20190506:23:14:05 gpcopy:gpadmin:smdw:027249-[INFO]:-Set to no compression mode for local data transfer optimization
20190506:23:14:05 gpcopy:gpadmin:smdw:027249-[INFO]:-Starting copy...
20190506:23:14:05 gpcopy:gpadmin:smdw:027249-[INFO]:-Copy Timestamp = 20190506231405
20190506:23:14:05 gpcopy:gpadmin:smdw:027249-[INFO]:-pg_dump (PostgreSQL) 8.2.15
20190506:23:14:05 gpcopy:gpadmin:smdw:027249-[INFO]:-pg_dumpall (PostgreSQL) 8.2.15
20190506:23:14:07 gpcopy:gpadmin:smdw:027249-[INFO]:-Copying database "gpadmin"
20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:-[Worker 0] Transferring table "gpadmin"."public"."test_real"
20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:-[Worker 0] Committing changes of table "gpadmin"."public"."test_real"
20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:-[Worker 0] Finished copying table "gpadmin"."public"."test_real"
20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:-------------------------------------------------
20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:-Total elapsed time: 3.445646855s
20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:-Copied 1 databases
20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:-   Database gpadmin: successfully copied 1 tables, skipped 0 tables, failed 0 tables
20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:-Copy completed successfully