PXF cannot Connect to the Hadoop Filesystem on Azure Data Lake
search cancel

PXF cannot Connect to the Hadoop Filesystem on Azure Data Lake

book

Article ID: 295736

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

Symptoms:
External table using PXF fails to read data with the error:
 
ERROR: remote component error (500) from '127.0.0.1:5888': type Exception report message org.apache.hadoop.fs.adl.AdlFsInputStream cannot be cast to org.apache.hadoop.hdfs.DFSInputStream description The server encountered an internal error that prevented it from fulfilling this request. exception java.io.IOException: org.apache.hadoop.fs.adl.AdlFsInputStream cannot be cast to org.apache.hadoop.hdfs.DFSInputStream (libchurl.c:944) (seg18 slice1 10.10.10.2:40002 pid=121290) (cdbdisp.c:254) DETAIL: External table ext_atp, file pxf://gpdb/external/test_cloud/file.txt?profile=HdfsTextSimple SQL state: XX000

Environment


Cause

PXF profiles HdfsTextSimple and HdfsTextMulti only support HDFS protocol. If the Hadoop filesystem is on the Azure Data Lake (HDInsight) then the HDFS client needs to use ADL protocol. This is not supported by the PXF.

Check the "core-site.xml" file for the configuration parameter fs.defaultFS to determine the protocol used by the HDFS client.

Resolution

The ADL protocol is not supported by the PXF.

The Hadoop filesystem needs to be hosted on a cluster that supports HDFS protocol to access the filesystem.