PXF S3 Insert Fails with "NoSuchBucket" Due to pxf.fs.basePath Configuration
search cancel

PXF S3 Insert Fails with "NoSuchBucket" Due to pxf.fs.basePath Configuration

book

Article ID: 441014

calendar_today

Updated On:

Products

VMware Tanzu Data Suite

Issue/Introduction

Issue Symptoms

When attempting to insert data into an external S3 table using Greenplum PXF, the job fails with an S3 NoSuchBucket exception.

The error output looks similar to the following:

[gpadmin@[local]] gpadmin=# insert into pxf_data VALUES (1, 'aaa');
ERROR:  PXF server error : `s3a://data/pfx_local_data/pxf-test/test.csv/9762-0000000002_1': getFileStatus on s3a://data/pfx_local_data/pxf-test/test.csv/9762-0000000002_1: com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: 18B11EF3DDDFF567; S3 Extended Request ID: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8; Proxy: null)... (seg1 ##.##.##.##:20001 pid=900560)
HINT:  Check the PXF logs located in the '/home/gpadmin/pxf_base/clusters/default/groups/default/logs' directory on host 'gpdb12' or 'set client_min_messages=LOG' for additional details.

 

Environment

Environment

  • Database: Greenplum Database

  • Component: Platform Extension Framework (PXF)

  • Storage: Amazon S3 (or S3-compatible object storage)

Cause

Root Cause

This issue occurs when the pxf.fs.basePath property is incorrectly enabled in the PXF server's pxf-site.xml configuration file while attempting to access an S3 bucket.

When you look closely at the error: getFileStatus on s3a://data/pfx_local_data/pxf-test/test.csv/...

You can see that PXF is injecting a local file path (data/pfx_local_data/) right after the s3a:// protocol identifier, before the actual target bucket name (which is pxf-test). In the S3 protocol, the first element after s3a:// is always interpreted as the bucket name. Consequently, PXF asks S3 for a bucket literally named data, which does not exist, triggering the NoSuchBucket error.

The pxf.fs.basePath property is strictly intended for use with local or network file systems using the file:* profile. It should not be active for S3 server configurations.

Resolution

Resolution

To resolve the issue, verify the bucket exists, and then remove or comment out the conflicting pxf.fs.basePath setting in your PXF server configuration.

Step 1: Verify the Target Bucket Confirm that your intended S3 bucket actually exists in your object storage environment.

Step 2: Update pxf-site.xml Navigate to the PXF server configuration directory that your external table is using and open pxf-site.xml. Locate the following property block:

    <property>
        <name>pxf.fs.basePath</name>
        <value>/data/pfx_local_data</value>
        <description>
            Sets the base path when constructing a file URI for read and write
            operations. This property MUST be configured for any server that
            accesses a file using a file:* profile.
        </description>
    </property>

Step 3: Disable the Property Either delete this property block entirely or comment it out using XML comments, as it is commented out by default upon fresh installations.

Step 4: Sync and Reload PXF For the changes to take effect across the Greenplum cluster, you must sync the PXF configuration and restart the PXF service as the gpadmin user:

pxf cluster sync
pxf cluster restart

Once the cluster is restarted, re-run your INSERT query. The path will correctly construct as s3a://pxf-test/test.csv/... and successfully locate your bucket.