How to use Copy Bulk Insert with postgres JDBC
search cancel

How to use Copy Bulk Insert with postgres JDBC

book

Article ID: 295686

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

You can use copy bulk insert with postgres JDBC in the following environments:
  • Pivotal HAWQ Database (HDB)
  • Pivotal Greenplum Database (GPDB) all versions


Environment


Cause

Normally, when inserting data into Pivotal HDB or Pivotal Greenplum via the JDBC, the driver will break up the data into single insert statements resulting in slower performance. Here is a sample log file showing JDBC executing a single insert statement:
2014-11-19 06:03:25.200902 CST,"gpadmin","gpadmin",p641638,th-1497897120,"127.0.0.1","63057",2014-11-19 06:02:44 CST,166050,con3067,cmd1006,seg-1,,,x166050,sx1,"LOG","00000","Query plan size to dispatch: 6KB",,,,,,"insert into jdbc_load values('lksdfd', 'slkdfjds', 'klsdjfwo', 'sodifjds', 'fdlksjfew')",0,,"cdbdisp.c",4010,

You might not notice the running single insert statements for hundreds of rows but when you reach thousands or millions, this will become painfully slow in a MPP environment. To speed up performance, you can use JDBC class copymanager to execute a SQL copy instead. Instead of seeing thousands of insert statements in the log you will only see one entry:

2014-11-20 13:46:33.970231 PST,"gpadmin","gpadmin",p58809,th824858464,"127.0.0.1","40103",2014-11-20 13:46:32 PST,1015,con9,cmd1,seg-1,,,x1015,sx1,"ERROR","22P04","missing data for column ""b""",,,,,"COPY jdbc_load, line 1: ""lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew""","COPY jdbc_load from STDIN",0,,"copy.c",5187,

The "Copy" statement will yield faster table loads when using JDBC with Pivotal GPDB or HAWQ.

Resolution

Get the code

https://github.com/randomtask1155/JDBCBulkInsert

Build the code
 

git clone https://github.com/randomtask1155/JDBCBulkInsert.git
cd JDBCBulkInsert
mvn package

new jar file will be found here target/JDBCBulkInsert-1.0-jar-with-dependencies.jar

Run a test

  • Create a table in database:
    gpadmin=# create table jdbc_load (a text, b text, c text, d text, e text );
  • Create a TAB delimited file /tmp/test.txt:
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
    lksdfd    slkdfjds    klsdjfwo    sodifjds    fdlksjfew
  • Load in /tmp/test.txt into table jdbc_load:
    [gpadmin@hdm1 ~]$ /usr/java/default/bin/java -classpath JDBCBulkInsert-1.0-jar-with-dependencies.jar io.pivotal.support.JDBCBulkInsert "jdbc:postgresql://localhost:5432/gpadmin" gpadmin gpadmin /tmp/test.txt jdbc_load;
    ~# Loading Driver org.postgresql.Driver
    ~# connecting to database with url jdbc:postgresql://localhost:5432/gpadmin
    ~# reading in file /tmp/test.txt
    ~# COPY operation completed successfully
  • Verify it worked:
    gpadmin=# select * from jdbc_load ;
       a    |    b     |    c     |    d     |     e
    --------+----------+----------+----------+-----------
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
     lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew
    (15 rows)


Additional Information

  •