When loading data from external tables with gpfdist, the query fails with the following error:
ERROR: gpfdist error - line too long in file /tmp/1.log near (0 bytes) (url.c:1746) (seg13 slice1 sdw3:43001 pid=23691) (cdbdisp.c:1499) DETAIL: External table new_test, file gpfdist://mdw:8090/1.log
The default row limit for external tables using gpfdist is 32KB as documented. If certain rows are longer than 32KB, the query will error out wit the message "line too long in file".
Here is more information regarding the test case:
msong=# create external table new_test ( a text) location('gpfdist://mdw:8090/1.log') FORMAT 'TEXT' (DELIMITER '|'); CREATE EXTERNAL TABLE msong=# select * from new_test; ERROR: gpfdist error - line too long in file /tmp/1.log near (0 bytes) (url.c:1746) (seg13 slice1 sdw3:43001 pid=23691) (cdbdisp.c:1499) DETAIL: External table new_test, file gpfdist://mdw:8090/1.log msong=#
gpfdist verbose logs contains the following 500 session error:
ps -ef|grep gpfdist gpadmin 20727 27150 0 13:37 pts/5 00:00:00 gpfdist -d /tmp -l /tmp/1.log -p 8090 -V cat /tmp/1.log [2014-06-18 13:39:14] ::ffff:172.28.8.7 - 500 session error [44] request end --------------------------------------------------- [2014-06-18 13:39:14] ::ffff:172.28.8.7 requests /1.log [2014-06-18 13:39:14] [45] got a request: GET /1.log HTTP/1.1 [2014-06-18 13:39:14] request headers:Host:172.28.8.250:8090 Accept:*/* X-GP-XID:1402370147-0000043790 X-GP-CID:0 X-GP-SN:0 X-GP-SEGMENT-ID:38 X-GP-SEGMENT-COUNT:48 X-GP-LINE-DELIM-LENGTH:-1 X-GP-PROTO:1 X-GP-MASTER_HOST:172.28.8.250 X-GP-MASTER_PORT:4300 X-GP-CSVOPT:m0x92q0h0 X-GP_SEG_PG_CONF:/data1/primary_4300/gpseg38/postgresql.conf X-GP_SEG_DATADIR:/data1/primary_4300/gpseg38 X-GP-DATABASE:msong X-GP-USER:gpadmin X-GP-SEG-PORT:43002 X-GP-SESSION-ID:83085 [2014-06-18 13:39:14] ::ffff:172.28.8.7 - 500 session error [45] request end ---------------------------------------------------
Use the -m option to increase the max row length for gpfdist. This value can be increased up to 256MB.
For example, increasing the value up to 64KB solved the issue in our test case.
gpfdist -t 600 -d /tmp -l /tmp/1.log -p 8090 -V -m 655350 &
Note: If using gpload, can pass the -m parameter value to gpfdist using the MAX_LINE_LENGTH parameter in the YAML file.