Query on an external table fails with message "NULL Hive composite object"
search cancel

Query on an external table fails with message "NULL Hive composite object"

book

Article ID: 294796

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

Symptoms:

Query on a Hive table from Pivotal HDB via Pivotal Extension Framework (PXF) (external table in Pivotal HDB) fails with the error:

[2016-06-30 13:56:08] [22000] ERROR: NULL Hive composite object (seg0 datanode1.local:40000 pid=601295)

The Hive table has a composite type; Data Definition Language (DDL) for the table in Hive is similar to this:

CREATE EXTERNAL TABLE `staging.users`(
 `user_id` string COMMENT 'unique identifier of the user',
 `date_of_birth` bigint COMMENT 'unix timestamp of date of birth of 
user',
 `user_attributes` map<string,string> COMMENT 'key value map of all 
user attributes, e.g. haircolor, size, eyecolor, etc.');

DDL for the external table in HDB is similar to this (the composite type is represented by a text or character field): 

CREATE EXTERNAL table history.ext_usere( 
user_id varchar(256), 
date_of_birth bigint, 
user_attributes text 
) 

LOCATION ('pxf://namenode1:51200/staging.users?PROFILE=Hive')
FORMAT 'custom' (formatter='pxfwritable_import' );

Environment


Cause

When using PXF, Hive composite (or complex) types are converted to string fields in Pivotal HDB. Pivotal HDB does not support NULL in a string or character type object so there is a check in the PXF code to make sure that there are no NULL characters in the composite types. If there are NULL characters in a composite type, the error "NULL Hive composite object" is given.

Resolution

1. In Hive, check if there are any NULL objects in the composite fields, for example:

select count(*) from staging.users where user_attributes is NULL;

2. If there are any NULL characters in any composite types, remove them in the source data (in Hive) and then try to run the query on the external table in Pivotal HDB again. 

Note: Running a query such as the one below in Pivotal HDB will also fail because the check for NULL Characters is run before the "where" clause is applied:

select count(*) from history.ext_users where user_attributes is not null;