Querying an external table fails with "java.io IO exception not a file" in Pivotal HBD
search cancel

Querying an external table fails with "java.io IO exception not a file" in Pivotal HBD

book

Article ID: 294884

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

Symptoms:

An external table has been set up in Pivotal HDB to read data from the Hive table staging.user_name_data:

CREATE EXTERNAL table ext_user_name_data ( 
user_id varchar(256), 
gender varchar(50), 
first_name varchar(100), 
last_name varchar(100), 
middle_name varchar(100) 
) 
LOCATION ('pxf://hawq20:51200/staging.user_name_data?PROFILE=Hive') 
FORMAT 'custom' (formatter='pxfwritable_import');

When querying the table using Pivotal Extension Framework (PXF) in Pivotal HDB, the following error message is produced:

gpadmin=# select count(*) from ext_user_name_data ;
ERROR: remote component error (500) from '192.168.177.170:51200': type Exception report message java.io.IOException: Not a file: hdfs://hawq20.lab:8020/apps/hive/warehouse/staging.db/user_name_data/base description The server encountered an internal error that prevented it from fulfilling this request. exception javax.servlet.ServletException: java.io.IOException: Not a file: hdfs://hawq20.lab:8020/apps/hive/warehouse/staging.db/user_name_data/base (libchurl.c:878)
gpadmin=#

Environment


Cause

This error is caused by the Hive data files being stored in a subdirectory under the table name. For example, the table "staging.user_name_data" stored in the subdirectory "base"
[hdfs@hawq20 ~]$ hdfs dfs -ls /apps/hive/warehouse/staging.db/user_name_data/base
Found 4 items
-rwxrwxrwx 3 hive hdfs 1910 2016-06-03 18:13 /apps/hive/warehouse/staging.db/user_name_data/base/000000_0
-rwxrwxrwx 3 hive hdfs 19265 2016-06-03 18:22 /apps/hive/warehouse/staging.db/user_name_data/base/000000_0_copy_1
-rwxrwxrwx 3 hive hdfs 19265 2016-06-03 19:50 /apps/hive/warehouse/staging.db/user_name_data/base/000000_0_copy_2
-rwxrwxrwx 3 hive hdfs 19265 2016-06-03 19:51 /apps/hive/warehouse/staging.db/user_name_data/base/000000_0_copy_3
[hdfs@hawq20 ~]$
Hive is likely configured to correctly read the files from the subdirectory, but Pivotal HDB is not.

Resolution

1. In Ambari, under MAPREDUCE / Custom mapred-site / add in two custom values:

hive.mapred.supports.subdirectories=true
mapred.input.dir.recursive=true

2. Restart all services as requested by Ambari (YARN, MAPREDUCE, HIVE).


3. Rerun the query to confirm that the issue is resolved:

gpadmin=# select count(*) from ext_user_name_data;
 count
-------
 646
(1 row)
gpadmin=#