Query on a Hive table from Pivotal HDB via Pivotal Extension Framework (PXF) (external table in Pivotal HDB) fails with the error:
[2016-06-30 13:56:08] [22000] ERROR: NULL Hive composite object (seg0 datanode1.local:40000 pid=601295)
The Hive table has a composite type; Data Definition Language (DDL) for the table in Hive is similar to this:
CREATE EXTERNAL TABLE `staging.users`( `user_id` string COMMENT 'unique identifier of the user', `date_of_birth` bigint COMMENT 'unix timestamp of date of birth of user', `user_attributes` map<string,string> COMMENT 'key value map of all user attributes, e.g. haircolor, size, eyecolor, etc.');
DDL for the external table in HDB is similar to this (the composite type is represented by a text or character field):
CREATE EXTERNAL table history.ext_usere( user_id varchar(256), date_of_birth bigint, user_attributes text ) LOCATION ('pxf://namenode1:51200/staging.users?PROFILE=Hive') FORMAT 'custom' (formatter='pxfwritable_import' );
1. In Hive, check if there are any NULL objects in the composite fields, for example:
select count(*) from staging.users where user_attributes is NULL;
2. If there are any NULL characters in any composite types, remove them in the source data (in Hive) and then try to run the query on the external table in Pivotal HDB again.
Note: Running a query such as the one below in Pivotal HDB will also fail because the check for NULL Characters is run before the "where" clause is applied:
select count(*) from history.ext_users where user_attributes is not null;