Regular expressions in LOCATION of external tables when using PXF Extension Framework (PXF)
search cancel

Regular expressions in LOCATION of external tables when using PXF Extension Framework (PXF)

book

Article ID: 296347

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

When using a regular expression on the LOCATION of an external table when using PXF Extension Framework (PXF) can result in the error "Illegal character in path at index XX".
ERROR: remote component error (500) from '127.0.0.1:5888': Type Exception Report Message Illegal character in path at index 45: hdfs://hdfs_master:8020/gpdb/pxf_hdfs_simple[12].txt Description The server encountered an unexpected condition that prevented it from fulfilling the request. Exception java.lang.IllegalArgumentException: Illegal character in path at index 45: hdfs://hdfs_master:8020/gpdb/pxf_hdfs_simple[12].txt (libchurl.c:935)


This issue is introduced after upgrading some components of PXF. In the above example, the external table was defined as:

CREATE EXTERNAL TABLE pxf_hdfs_textsimple_pattern(location text, month text, num_orders int, total_sales float8)
LOCATION ('pxf://gpdb/pxf_hdfs_simple[12].txt?SERVER=hdfs3&PROFILE=HdfsTextSimple')
FORMAT 'TEXT' (delimiter=E',');


The LOCATION of the external table when using PXF has a regular expression to match filenames 
/gpdb/pxf_hdfs_simple1.txt and /gpdb/pxf_hdfs_simple2.txt

The supported regular expressions are defined here: Hadoop Glob Patterns.


Environment

Product Version: 6.7

Resolution

The issue is fixed in Greenplum Database (GPDB) 6.9.0 and above.

A fix is also available in PXF RPM 5.13 and above - PXF RPM Download.