PXF returning java.io.FileNotFoundException but the file does exist
search cancel

PXF returning java.io.FileNotFoundException but the file does exist

book

Article ID: 296475

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

After upgrading GPDB to 5.20 or later, when querying some PXF external tables, you get the following error:
ERROR: remote component error (500) from '127.0.0.1:5888': type Exception report message javax.servlet.ServletException: java.io.FileNotFoundException: File hdfs://hdp-26-master.datalab.local:8020/tmp/pxf* does not exist. description The server encountered an internal error that prevented it from fulfilling this request. exception javax.servlet.ServletException: javax.servlet.ServletException: java.io.FileNotFoundException: File hdfs://hdp-26-master.datalab.local:8020/tmp/pxf* does not exist. (libchurl.c:946) (seg0 slice1 10.193.102.13:30563 pid=20664) (cdbdisp.c:254)
The tables are using the profile Hdfstextmulti and wildcard * in the location clause, for example:
External location: pxf://tmp/pxf*?PROFILE=Hdfstextmulti
If you remove the wildcard, or change the profile to Hdfstextsimple, the query works.

Environment

Product Version: 5.21

Resolution

This has been reported as a product limitation, appearing in version GPDB 5.20 onwards.

The fix will be included in GPDB 5.23.0.

The current workaround for now is to edit the file $PXF_CONF/conf/pxf-profiles.xml and add the following block:
    <profile>
        <name>HdfsTextMulti</name>
        <plugins>
            <fragmenter>org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
            <accessor>org.greenplum.pxf.plugins.hdfs.QuotedLineBreakAccessor</accessor>
            <resolver>org.greenplum.pxf.plugins.hdfs.StringPassResolver</resolver>
        </plugins>
    </profile>
Then issue a pxf cluster sync, pxf cluster stop and a pxf cluster start.