Copying Data Tables from One Server to Another which are not connected (Using gpfdist)
search cancel

Copying Data Tables from One Server to Another which are not connected (Using gpfdist)

book

Article ID: 295192

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

This article describes How to copy data of a table from one server to another server.

 


Resolution

On Source Server

1. Create a direcotry under /data filesystem or which ever has available space:

cd /data
mkdir transfer_tables

2. Start a "gpfdist" process for the directory created:

gpfdist -d /data/transfer_tables -p 8080 -l /tmp/gpfdist.log & 

-- Check if the gpfdist process started or not using "ps -ef|grep gpfdist" command. Output would something like below:

[gpadmin@mdw ~]$ ps -ef|grep dist
gpadmin657610 May03 ?00:00:00 gpfdist -d /data/transfer_tables -p 8080 -l gpfdist.log
gpadmin1543993210 17:00 pts/100:00:00 grep dist

3. Connect to the database and create "writable" external tables for each table that needs to be moved. -- my table to transfer is "move_me":

gpadmin=# \d move_me
Table "public.move_me"
Column |Type| Modifiers
--------+-------------------+-----------
id| integer|
name| character varying |
Distributed by: (id)

-- Create a writable external table ext_of_move_me LIKE move_me table using below command:

gpadmin=# CREATE WRITABLE EXTERNAL TABLE ext_of_move_me(like move_me) location ('gpfdist://mdw:8080/move_me.csv') format 'text';
NOTICE:Table doesn't have 'distributed by' clause, defaulting to distribution columns from LIKE table
CREATE EXTERNAL TABLE

-- Transfer the data into a flat file by inserting into the external table created:

gpadmin=# insert into ext_of_move_me select * from move_me;
INSERT 0 1000

-- You can see the file created in the location given to gpfdist while starting it:

ls -ltrh /data/transfer_tables 

4. Once you have created flat files for all tables which needs to be copied, using above procedure(step 3), you can use an external hard disk to move the files to target machine as there is no connectivity between servers.

On Target Server

5. Once all flat files are copied into target machine(assuming you have enough space to hold these flat files), start the gpfdist process(like in step2). Assuming you have copied the files in the same location as the source(it can be different, however, you will have to change the directory in gpfdist command):

gpfdist -d /data/transfer_tables -p 8080 -l /tmp/gpfdist.log &

6. Connect to the database and create a "readable" external table to load the data into tables. However, you will have to create these external tables with the same structure as the source system. If you already have a structure in the target table, that would be good.

-- Connect to the database and create a readable external table like below(assuming you dont have structure at target), so specifying all columns names while creating it:

checkcat=# CREATE READABLE EXTERNAL TABLE ext_to_load_move_me(id int, name varchar) location ('gpfdist://mdw:8080/move_me.csv') FORMAT 'TEXT';
CREATE EXTERNAL TABLE
checkcat=#
Note: here we have used "move_me.csv" file, for other table you might have copied into other files, so specify that file name accordingly in the command.

-- Create the table in which you want to load the data:

checkcat=# CREATE TABLE move_me(id INT, name VARCHAR);
NOTICE:Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
HINT:The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE

-- Insert the data using below command:

checkcat=# insert into move_me select * from ext_to_load_move_me ;
INSERT 0 1000
checkcat=# select count(*) from move_me;
count
-------
1000
(1 row)

Additional Information

 

+ Environment:

Pivotal Greenplum Database (GPDB) 4.x and 5.x