In the log below, the test_real
table has a real type distribution:
gpadmin=# create table test_real as SELECT random()::real as real_id from generate_series(1,100); gpadmin=# \d test_real Table "public.test_real" Column | Type | Modifiers ---------+------+----------- real_id | real | Distributed by: (real_id)
Backup the table using
gpbackup
:
# gpbackup --dbname gpadmin --include-table public.test_real --backup-dir /tmp/gp4_backup
$ gprestore --backup-dir /tmp/gp4_backup --timestamp 20190517101006 --include-table public.test_real
20190517:10:19:27 gprestore:gpadmin:greenplum-aio-02:027790-[INFO]:-Restore Key = 20190517101006 20190517:10:19:27 gprestore:gpadmin:greenplum-aio-02:027790-[INFO]:-Restoring pre-data metadata Pre-data objects restored: 2 / 2 [=====================================================] 100.00% 0s 20190517:10:19:27 gprestore:gpadmin:greenplum-aio-02:027790-[INFO]:-Pre-data metadata restore complete Tables restored: 0 / 1 [------------------------------------------------------------------] 0.00% 20190517:10:19:27 gprestore:gpadmin:greenplum-aio-02:027790-[CRITICAL]:-Error loading data into table public.test_real: ERROR: value of distribution key doesn't belong to segment with ID 0, it belongs to segment with ID 1 (seg0 10.34.6.199:20000 pid=27828) (SQLSTATE 22P04)
The real type distribution strategy in Greenplum 4.x is different from the distribution strategy in Greenplum 5.x, causing the error.
When using gprestore
, the utility will attempt to place data into the original segment ID based on the backup file name:
### Content 0 ### [gpadmin@greenplum-aio-02][~]$ ls -l /tmp/gp4_backup/gpdb_4.3.31.0_0/backups/20190517/20190517101006/ total 4 -rw------- 1 gpadmin gpadmin 248 May 17 10:10 gpbackup_0_20190517101006_34939.gz ### Content 1 ### [gpadmin@greenplum-aio-02][~]$ ls -l /tmp/gp4_backup/gpdb_4.3.31.0_1/backups/20190517/20190517101006/ total 4 -rw------- 1 gpadmin gpadmin 211 May 17 10:10 gpbackup_1_20190517101006_34939.gz
Due to the distribution strategy change, data belonging to content#0 in Greenplum 4.x should populate a different area in Greenplum 5.x.
### GP4 ### gpadmin=# SELECT gp_segment_id,count(*) from gp_dist_random('test_real') group by 1; gp_segment_id | count ---------------+------- 1 | 54 0 | 46 (2 rows) ### GP5 ### gpadmin=# SELECT gp_segment_id,count(*) from gp_dist_random('test_real') group by 1; gp_segment_id | count ---------------+------- 0 | 51 1 | 49
This article details two workarounds for the error message returned by gprestore
while restoring data from Greenplum Database (GPDB) 4.3x to GPDB 5.x.gprestore
will return the error message listed below if:
real
'gpbackup
in GPDB 4.xError loading data into table [table_name]: ERROR: value of distribution key doesn't belong to segment with ID 0, it belongs to segment with ID 1
gpcopy
Refer to the resolution section for more details.
Two workarounds for this issue are described below.
1. Change the distribution key to random
in the original table and use gpbackup
or gprestore
. Revert the distribution key in Greenplum 5.x.
- On GP4, change the distributing key to random # ALTER TABLE test_real SET distributed randomly; - Run the backup # gpbackup --dbname gpadmin --include-table public.test_real --backup-dir /tmp/gp4_backup - Restore the data into GP5 $ gprestore --backup-dir /tmp/gp4_backup --timestamp 20190517102336 --include-table public.test_real - On GP5, change the distributing key back to original one gpadmin=# \d test_real Table "public.test_real" Column | Type | Modifiers ---------+------+----------- real_id | real | Distributed randomly gpadmin=SELECT gp_segment_id, count(*) from gp_dist_random('test_real') group by 1; gpadmin=# \d test_real Table "public.test_real" Column | Type | Modifiers ---------+------+----------- real_id | real | Distributed by: (real_id)
2. Use gpcopy
:
$ gpcopy --source-host smdw --source-port 3733 --source-user gpadmin --dest-host smdw --dest-port 4987 --dest-user gpadmin --include-table gpadmin.public.test_real --truncate 20190506:23:14:05 gpcopy:gpadmin:smdw:027249-[INFO]:-Set to no compression mode for local data transfer optimization 20190506:23:14:05 gpcopy:gpadmin:smdw:027249-[INFO]:-Starting copy... 20190506:23:14:05 gpcopy:gpadmin:smdw:027249-[INFO]:-Copy Timestamp = 20190506231405 20190506:23:14:05 gpcopy:gpadmin:smdw:027249-[INFO]:-pg_dump (PostgreSQL) 8.2.15 20190506:23:14:05 gpcopy:gpadmin:smdw:027249-[INFO]:-pg_dumpall (PostgreSQL) 8.2.15 20190506:23:14:07 gpcopy:gpadmin:smdw:027249-[INFO]:-Copying database "gpadmin" 20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:-[Worker 0] Transferring table "gpadmin"."public"."test_real" 20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:-[Worker 0] Committing changes of table "gpadmin"."public"."test_real" 20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:-[Worker 0] Finished copying table "gpadmin"."public"."test_real" 20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:------------------------------------------------- 20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:-Total elapsed time: 3.445646855s 20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:-Copied 1 databases 20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:- Database gpadmin: successfully copied 1 tables, skipped 0 tables, failed 0 tables 20190506:23:14:08 gpcopy:gpadmin:smdw:027249-[INFO]:-Copy completed successfully