Configuring S3 Endpoints with the PXF connector
search cancel

Configuring S3 Endpoints with the PXF connector

book

Article ID: 295176

calendar_today

Updated On:

Products

VMware Tanzu Greenplum VMware Tanzu Greenplum / Gemfire VMware Tanzu Data Suite

Issue/Introduction

+ When enabling S3 Endpoints while using the PXF connector, adding the Endpoint location to the PXF 'Location' field does not configure the connection properly. 
+ Take the working sample pxf location below (without Endpoint configured).

pxf://S3_BUCKET/dir/file.txt?PROFILE=s3:text&SERVER=s3srvcfg&accesskey=YOURKEY&secretkey=YOURSECRET'

+ When adding the s3-endpoint location and verifying with gpcheckcloud, the configuration is validated.

gpssh -f ~/allhosts "gpcheckcloud -c 's3://<s3-endpoint>/S3_BUCKET/dir/file.txt?PROFILE=s3:text&SERVER=s3srvcfg&accesskey=YOURKEY&secretkey=YOURSECRET'
...
...
[ sdw3] Your configuration works well.

 + However, when adding the S3 endpoint to the PXF location field, the PXF connector is not configured properly.

CREATE READABLE EXTERNAL TABLE [...]
LOCATION('pxf://<s3-endpoint>/S3_BUCKET/dir/file.txt?PROFILE=s3:text&SERVER=s3srvcfg&accesskey=YOURKEY&secretkey=YOURSECRET'
[...]
-> LOCATION: report_invalid_encoding, wchar.c:2011



Resolution

To configure an S3 endpoint correctly, the value must be added to the s3-site.xml file directly.

-> /s3-site.xml 
<property>
 <name>fs.s3a.endpoint</name>
 <value>[s3-endpoint]</value>
</property>


PXF location is then configured normally:

CREATE READABLE EXTERNAL TABLE [...]
LOCATION('pxf://S3_BUCKET/dir/file.txt?PROFILE=s3:text&SERVER=s3srvcfg&accesskey=YOURKEY&secretkey=YOURSECRET'
[...]


PXF S3 connectivity can be further altered as-needed by specifying properties identified in the Hadoop-AWS module documentation: Hadoop-AWS module: Integration with Amazon Web Services