How to read AVRO files using Pivotal HDB
search cancel

How to read AVRO files using Pivotal HDB

book

Article ID: 294813

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

The article below will describe the steps required to read AVRO files using Pivotal HDB. In the example below, we discuss two different approaches to read AVRO files.

Environment


Resolution

Case 1

Read raw AVRO files on the HDFS.

gpadmin=# CREATE EXTERNAL TABLE hdb_avro_example (number int, first_name varchar, last_name varchar)                                                         
LOCATION ('pxf://hdm1:50070/tmp/doctors.avro?Profile=Avro') 
FORMAT 'custom' (formatter='pxfwritable_import');
[gpadmin@hdw3 ~]$hdfs dfs -copyFromLocal doctors.avro /tmp/doctors.avro 
gpadmin=# select * from hdb_avro_example ;                                                                                                                   number | first_name  | last_name

--------+-------------+-----------

      6 | Colin       | Baker

      3 | Jon         | Pertwee

      4 | Tom         | Baker

      5 | Peter       | Davison

     11 | Matt        | Smith

      1 | William     | Hartnell

      7 | Sylvester   | McCoy

      8 | Paul        | McGann

      2 | Patrick     | Troughton

      9 | Christopher | Eccleston

     10 | David       | Tennant

(11 rows)
 

Case 2

Read from the Hive tables based on the AVRO schema. In the example below, we defined an Avro schema and included an extra_field column just to showcase that default value can be read in the Hive table and used in HDB when required.

hive> CREATE TABLE doctors

     ROW FORMAT

     SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'

     STORED AS

     INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'

     OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'

     TBLPROPERTIES ('avro.schema.literal'='{

       "namespace": "testing.hive.avro.serde",

       "name": "doctors",

       "type": "record",

       "fields": [

         {

           "name":"number",

           "type":"int",

           "doc":"Order of playing the role"

         },

         {

           "name":"first_name",

           "type":"string",

           "doc":"first name of actor playing role"

         },

         {

           "name":"last_name",

           "type":"string",

           "doc":"last name of actor playing role"

         },

         {

           "name":"extra_field",

           "type":"string",

           "doc:":"an extra field not in the original file",

           "default":"fishfingers and custard"

         }

       ]

     }');

OK

Time taken: 5.789 seconds
hive> LOAD DATA LOCAL INPATH '/tmp/doctors.avro'  INTO TABLE doctors;
gpadmin=# CREATE EXTERNAL TABLE hive_avro_test (number int, first_name varchar, last_name varchar, extra_field varchar) LOCATION ('pxf://hdm1.phd.local:50070/doctors?PROFILE=Hive') FORMAT 'custom' (formatter='pxfwritable_import');

CREATE EXTERNAL TABLE

gpadmin=# select * from hive_avro_test ;

 number | first_name  | last_name |       extra_field

--------+-------------+-----------+-------------------------

      6 | Colin       | Baker     | fishfingers and custard

      3 | Jon         | Pertwee   | fishfingers and custard

      4 | Tom         | Baker     | fishfingers and custard

      5 | Peter       | Davison   | fishfingers and custard

     11 | Matt        | Smith     | fishfingers and custard

      1 | William     | Hartnell  | fishfingers and custard

      7 | Sylvester   | McCoy     | fishfingers and custard

      8 | Paul        | McGann    | fishfingers and custard

      2 | Patrick     | Troughton | fishfingers and custard

      9 | Christopher | Eccleston | fishfingers and custard

     10 | David       | Tennant   | fishfingers and custard

(11 rows)
doctors.avro  (521 Bytes, there is no preview, click Download from the top when you open the file).