When querying external table via the gpfdist protocol, it fails with error message “HTTP/1.0 400 invalid request.”
Error Message:
1. Query gpfdist based external table and failed:
gpadmin=# create external table ext_tt1(c1 int, c2 int, c3 int) location ('gpfdist://192.0.2.1:8081/BATCH 1/*.csv') format 'CSV'; CREATE EXTERNAL TABLE Time: 221.612 ms gpadmin=# select * from ext_tt1; ERROR: http response code 400 from gpfdist (gpfdist://192.0.2.1:8081/BATCH 1/*.csv): HTTP/1.0 400 invalid request (url_curl.c:454) (seg0 slice1 sdw3:49000 pid=5609) (cdbdisp.c:1322)
2. Logs from the Greenplum segment:
2017-08-14 12:52:42.761391 CDT,"gpadmin","gpadmin",p5609,th-843743168,"192.0.2.1","47178",2017-08-14 12:52:38 CDT,379759,con54,cmd7,seg0,slice1,,x379759,sx1,"LOG","00000","quit fill_buffer due to still_running = 0, bot = 0, top = 0, want = 1, for_write = 0, error = 0, eof = 0, datalen = 0, maxfd = 18, nfds = 1, e = 0",,,,,,"select * from ext_tt1;",0,,"url_curl.c",715, 2017-08-14 12:52:43.828111 CDT,"gpadmin","gpadmin",p5609,th-843743168,"192.0.2.1","47178",2017-08-14 12:52:38 CDT,379759,con54,cmd7,seg0,slice1,,x379759,sx1,"ERROR","XX000","http response code 400 from gpfdist (gpfdist://192.0.2.1:8081/BATCH 1/*.csv): HTTP/1.0 400 invalid request (url_curl.c:454)",,,,,,"select * from ext_tt1;",0,,"url_curl.c",454,"Stack trace:
3. gpfdist logs:
2017-08-14 12:52:24 24311 INFO [0:7:0:9] ::ffff:192.0.2.2 requests /BATCH 2017-08-14 12:52:24 24311 INFO [0:7:0:9] got a request at port 42902: GET /BATCH 1/*.csv HTTP/1.1 2017-08-14 12:52:24 24311 INFO [0:7:0:9] request headers: 2017-08-14 12:52:24 24311 INFO [0:7:0:9] Host:192.0.2.1:8081 2017-08-14 12:52:24 24311 INFO [0:7:0:9] Accept:*/* 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP-XID:1502731703-0000000045 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP-CID:0 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP-SN:1 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP-SEGMENT-ID:0 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP-SEGMENT-COUNT:24 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP-LINE-DELIM-LENGTH:-1 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP-PROTO:1 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP-MASTER_HOST:127.0.0.2 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP-MASTER_PORT:54320 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP-CSVOPT:m1x34q34h0n0 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP_SEG_PG_CONF:/data1/primary/bloomtest/gpseg0/postgresql.conf 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP_SEG_DATADIR:/data1/primary/bloomtest/gpseg0 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP-DATABASE:gpadmin 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP-USER:gpadmin 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP-SEG-PORT:49000 2017-08-14 12:52:24 24311 INFO [0:7:0:9] X-GP-SESSION-ID:54 2017-08-14 12:52:24 24311 INFO [0:12:0:16] reject invalid request from ::ffff:192.0.2.2 2017-08-14 12:52:24 24311 WARN [0:12:0:16] HTTP ERROR: ::ffff:192.0.2.2 - 400 invalid request
A whitespace character is included in the LOCATION field in the definition of the external table. As shown in above example whitespace is included in "BATCH 1."
According to GPDB document wildcards or other C-style pattern matching is accepted by gpfdist for denoting multiple files. And, a whitespace character should be specified as [[:space:]].
Follow the document to specify whitespace character as necessary.
For issue in above example, revise DDL of an external table as follows and then it could work:
gpadmin=# create external table ext_tt1(c1 int, c2 int, c3 int) location ('gpfdist://192.0.2.1:8081/BATCH[[:space:]]1/*.csv') format 'CSV'; CREATE EXTERNAL TABLE gpadmin=# select * from ext_tt1; c1 | c2 | c3 ----+----+---- 1 | 2 | 3 (1 row)