2014-11-19 06:03:25.200902 CST,"gpadmin","gpadmin",p641638,th-1497897120,"127.0.0.1","63057",2014-11-19 06:02:44 CST,166050,con3067,cmd1006,seg-1,,,x166050,sx1,"LOG","00000","Query plan size to dispatch: 6KB",,,,,,"insert into jdbc_load values('lksdfd', 'slkdfjds', 'klsdjfwo', 'sodifjds', 'fdlksjfew')",0,,"cdbdisp.c",4010,
You might not notice the running single insert statements for hundreds of rows but when you reach thousands or millions, this will become painfully slow in a MPP environment. To speed up performance, you can use JDBC class copymanager
to execute a SQL copy instead. Instead of seeing thousands of insert statements in the log you will only see one entry:
2014-11-20 13:46:33.970231 PST,"gpadmin","gpadmin",p58809,th824858464,"127.0.0.1","40103",2014-11-20 13:46:32 PST,1015,con9,cmd1,seg-1,,,x1015,sx1,"ERROR","22P04","missing data for column ""b""",,,,,"COPY jdbc_load, line 1: ""lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew""","COPY jdbc_load from STDIN",0,,"copy.c",5187,
The "Copy" statement will yield faster table loads when using JDBC with Pivotal GPDB or HAWQ.
Get the code
https://github.com/randomtask1155/JDBCBulkInsert
Build the code
git clone https://github.com/randomtask1155/JDBCBulkInsert.git cd JDBCBulkInsert mvn package
new jar file will be found here target/JDBCBulkInsert-1.0-jar-with-dependencies.jar
Run a test
gpadmin=# create table jdbc_load (a text, b text, c text, d text, e text );
/tmp/test.txt
:
lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew lksdfd slkdfjds klsdjfwo sodifjds fdlksjfew
/tmp/test.txt
into table jdbc_load
:
[gpadmin@hdm1 ~]$ /usr/java/default/bin/java -classpath JDBCBulkInsert-1.0-jar-with-dependencies.jar io.pivotal.support.JDBCBulkInsert "jdbc:postgresql://localhost:5432/gpadmin" gpadmin gpadmin /tmp/test.txt jdbc_load; ~# Loading Driver org.postgresql.Driver ~# connecting to database with url jdbc:postgresql://localhost:5432/gpadmin ~# reading in file /tmp/test.txt ~# COPY operation completed successfully
gpadmin=# select * from jdbc_load ; a | b | c | d | e --------+----------+----------+----------+----------- lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew lksdfd | slkdfjds | klsdjfwo | sodifjds | fdlksjfew (15 rows)