This article explains provides a method to move a table from clusters or database to a different database or cluster without the need for a backup and a restore. It very useful if developers want a replica of the original tables for their development and testing on a different database.
Note: Please verify the script on a test cluster before running it on production.
Use this script that is needed to replicate the table quickly to another database:
pg_dump -t <schema-name.source-table-name> -h <source-host-name> <source-database-name> | psql -h <destination-host-name> -U <destination-user> -p <destination-port> <destination-database>
Note: There is another tool "gpbackup" that can be used for data transfer (gpbackup is included in v4.3.26.0 and later releases).
Example:
1. Create a database and table. Generate a source database.
gpadmin=# create database d1; CREATE DATABASE Time: 8855.715 ms gpadmin=# \c d1 You are now connected to database "d1" as user "gpadmin".
2. Creating a table to replicate data and load data into it.
d1=# create table t1 as select * from pg_attribute; NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'attrelid' as the Greenplum Database data distribution key for this table. HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. SELECT 2998 Time: 1705.794 ms
3. Create the destination database:
d1=# create database d2; CREATE DATABASE Time: 10753.638 ms d1=# \q
4. Run the command to backup and load data simultaneously.
gpadmin:Fullrack@mdw $ pg_dump -C -t t1 d1 | psql d2 Timing is on. Pager usage is off. SET Time: 184.239 ms SET Time: 0.740 ms SET Time: 0.677 ms SET Time: 0.652 ms SET Time: 0.598 ms SET Time: 0.635 ms SET Time: 1.565 ms SET Time: 0.640 ms SET Time: 0.601 ms CREATE TABLE Time: 66.210 ms ALTER TABLE Time: 18.538 ms Time: 1247.149 ms
Note: This data loaded to a different database on the same cluster. So, many parameters that are mentioned above are removed as they are not needed here. To replicate them on a different cluster ensure you have the pg_hba.conf is updated in the source location with the correct information. This will allow the the psql to connect to the source host and database without any hassle.
5. Cross verification shows everything is worked successfully and the table is loaded on the destination database.
gpadmin:Fullrack@mdw $ psql d2 Timing is on. Pager usage is off. psql (8.2.15) Type "help" for help. d2=# \d List of relations Schema | Name | Type | Owner | Storage --------+------+-------+---------+--------- public | t1 | table | gpadmin | heap (1 row) d2=# select count(*) from t1; count ------- 2998 (1 row) Time: 82.246 ms
Tip:
If the source table has many grant permission then the pg_dump will run them on the destination, to eliminate the grant (permission part) use the command below:
pg_dump -t <schema-name.source-table-name> -h <source-host-name> <source-database-name> | egrep -v "GRANT|REVOKE" | psql -h <destination-host-name> -U <destination-user> -p <destination-port> <destination-database>
To load just the data use pg_dump with -a option.
pg_dump -a -t <schema-name.source-table-name> -h <source-host-name> <source-database-name> | psql -h <destination-host-name> -U <destination-user> -p <destination-port> <destination-database>
For more options refer to the help from pg_dump:
pg_dump --help
If the data set is very large, check gptransfer utility, here is an example on how to use it.
It is possible to use psql to do the same task. Make sure the DDL of the table is available on the destination.
psql -c "copy (select * from <source-table>) to stdout" <source-database> | psql -c "copy <destination-table> from stdin" <destination database>
Related articles