+ When enabling S3 Endpoints while using the PXF connector, adding the Endpoint location to the PXF 'Location' field does not configure the connection properly.
+ Take the working sample pxf location below (without Endpoint configured).
pxf://S3_BUCKET/dir/file.txt?PROFILE=s3:text&SERVER=s3srvcfg&accesskey=YOURKEY&secretkey=YOURSECRET'
+ When adding the s3-endpoint location and verifying with gpcheckcloud, the configuration is validated.
gpssh -f ~/allhosts "gpcheckcloud -c 's3://<s3-endpoint>/S3_BUCKET/dir/file.txt?PROFILE=s3:text&SERVER=s3srvcfg&accesskey=YOURKEY&secretkey=YOURSECRET'
...
...
[ sdw3] Your configuration works well.
+ However, when adding the S3 endpoint to the PXF location field, the PXF connector is not configured properly.
CREATE READABLE EXTERNAL TABLE [...] LOCATION('pxf://<s3-endpoint>/S3_BUCKET/dir/file.txt?PROFILE=s3:text&SERVER=s3srvcfg&accesskey=YOURKEY&secretkey=YOURSECRET' [...] -> LOCATION: report_invalid_encoding, wchar.c:2011
To configure an S3 endpoint correctly, the value must be added to the s3-site.xml file directly.
-> /s3-site.xml <property> <name>fs.s3a.endpoint</name> <value>[s3-endpoint]</value> </property>
PXF location is then configured normally:
CREATE READABLE EXTERNAL TABLE [...] LOCATION('pxf://S3_BUCKET/dir/file.txt?PROFILE=s3:text&SERVER=s3srvcfg&accesskey=YOURKEY&secretkey=YOURSECRET' [...]
PXF S3 connectivity can be further altered as-needed by specifying properties identified in the Hadoop-AWS module documentation: Hadoop-AWS module: Integration with Amazon Web Services