Configuring PXF for Local File Read/Write in Greenplum 7
search cancel

Configuring PXF for Local File Read/Write in Greenplum 7

book

Article ID: 439819

calendar_today

Updated On:

Products

VMware Tanzu Data Suite VMware Tanzu Greenplum

Issue/Introduction

When attempting to configure the Platform Extension Framework (PXF) to write or read local data on segment hosts in Greenplum 7, users may encounter one or more of the following sequential errors:

Error 1: Unsupported FileSystem Scheme

ERROR: PXF server error : No FileSystem for scheme "localfile" (seg14 10.111.14.25:6002 pid=3689884)
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "localfile"

Error 2: Invalid Server Configuration (Missing basePath)

ERROR: PXF server error : invalid configuration for server 'default' (seg1 10.111.14.23:6001 pid=4193154)
HINT: Configure a valid value for 'pxf.fs.basePath' property for server 'default' to access the filesystem.

Unexpected Behavior: Duplicated Directory Paths Files are successfully written, but land in nested duplicated directories (e.g., /data/pfx_local_data/data/pfx_local_data/out.csv).

Environment

 

  • Tanzu Greenplum 7.x

  • Platform Extension Framework (PXF) 6.x / 7.x

 

Cause

This issue is caused by a combination of architecture changes, security enhancements, and standard URI parsing rules in modern PXF versions:

  1. Protocol Inference: PXF automatically infers the Hadoop filesystem scheme from the profile name prefix. Using a custom <protocol>localfile</protocol> or naming the profile localfile:text forces Hadoop to look for localfile://, which does not natively exist. Hadoop requires the standard file:// scheme.

  2. Multi-Server Directory Architecture: PXF 7 does not read active server configurations from the /templates/ directory or the global conf/ directory. Server-specific properties (like local file access) must reside in the dedicated clusters/default/servers/<server_name>/ path.

  3. Strict Local File Security (pxf.fs.basePath): To prevent unauthorized OS file access, PXF disables local file system read/write by default. Users must explicitly authorize a safe root directory.

  4. URI Routing and Path Concatenation: * A two-slash URI (pxf://data/...) causes the parser to treat the first directory word (data) as the PXF Server Name, breaking absolute path routing. Three slashes (pxf:///) are required to route to the default server.

    • PXF internally appends the DDL LOCATION path directly to the authorized basePath. Specifying the full absolute path in both places results in duplicated nested directories.

Resolution

Follow these steps to correctly configure PXF for local file operations using the default PXF instance.

Step 1: Configure the Global Profile

Ensure your custom profile uses the file: prefix so PXF infers the native Hadoop local file protocol. Do not include a custom <protocol> tag.

Open the global profiles configuration:

vi $PXF_BASE/clusters/default/conf/pxf-profiles.xml

Add the following profile inside the <profiles> root block:

<?xml version="1.0" encoding="UTF-8"?>
<profiles>
    <profile>
        <name>file:text</name>
        <description>Profile for reading and writing local text data</description>
        <plugins>
            <fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
            <accessor>org.greenplum.pxf.plugins.hdfs.LineBreakAccessor</accessor>
            <resolver>org.greenplum.pxf.plugins.hdfs.StringPassResolver</resolver>
        </plugins>
    </profile>
</profiles>

Step 2: Authorize Local File Access (basePath)

Create the active configuration directory for the default server. Note that in PXF 7, the servers directory sits directly under clusters/default/.

# Create the active default server directory
mkdir -p $PXF_BASE/clusters/default/servers/default

# Copy the template file to the active directory
cp /usr/local/pxf-gp7/templates/server/pxf-site.xml $PXF_BASE/clusters/default/servers/default/

Edit the active pxf-site.xml:

vi $PXF_BASE/clusters/default/servers/default/pxf-site.xml

Locate, uncomment, and configure the pxf.fs.basePath property to point to your secure OS target directory (e.g., /data/pfx_local_data):

    <property>
        <name>pxf.fs.basePath</name>
        <value>/data/pfx_local_data</value>
        <description>
            Sets the base path when constructing a file URI for read and write
            operations. This property MUST be configured for any server that
            accesses a file using a file:* profile.
         </description>
    </property>

(Ensure there are no trailing characters or typos inside the <value> tags).

 

Step 3: Sync and Restart PXF

Apply the configurations across all coordinator and segment hosts.

pxf cluster sync
pxf cluster restart

Step 4: Define the External Table DDL Correctly

When creating the external table in Greenplum, apply two critical rules:

  1. Use three slashes (pxf:///) to target the default server configuration.

  2. Specify only the relative path (the filename/folder) after the slashes, as PXF will automatically prepend your configured basePath.

DROP EXTERNAL TABLE IF EXISTS pxf_write_test;

CREATE WRITABLE EXTERNAL TABLE pxf_write_test (
    id  int,
    msg text
)
LOCATION (
  -- Combines basePath (/data/pfx_local_data) with relative location (/out.csv)
  'pxf:///out.csv?PROFILE=file:text'
)
FORMAT 'CSV';

-- Execute the write operation
INSERT INTO pxf_write_test VALUES
(1, 'hello'),
(2, 'pxf write test'),
(3, 'greenplum 7');

The output files will now successfully land in /data/pfx_local_data/out.csv/ across the executing segment hosts.