Running a query from Tableau against a Hive external table, that is pulling data from HBase, fails. However, regular tables in Hive can be queried successfully from Tableau. All queries executed directly from a Hive client on the cluster node work, regardless if its an external table or a regular table.
In Tableau you will see an error similar to the one below:
Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
Error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableSplit at java.net.URLClassLoader$1.run(URLClassLoader.java:372) at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
Error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.HTable at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method)
Error: java.io.IOException: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) ... ... Caused by: java.lang.ClassNotFoundException: org.cloudera.htrace.Trace at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
Error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingInterface at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
Integration of Hive with HBase requires all required HBase classes to be available in the Hive classpath. Auxiliary jar path setup in /etc/gphd/hive/conf/hive-env.sh is not visible to Tableau initiated connections. Even if /etc/gphd/hive/conf/hive-env.sh is set to view Tableau initiated connections, the errors mentions above will still be produced. You must set the hive.aux.jars.path parameter in hive-site.xml for the required jars to be located while querying from Tableau.
Other queries on regular Hive tables from Tableau will work since the required jars have already been sourced.
In case there are other similar "class not found exceptions" produced, you need to add the respective classes to
hive.aux.jars.path in hive-site.xml.
Add the parameters below to /etc/gphd/hive/conf/hive-site.xml on the Hive server. A restart of the Hive server is not required.
<property> <name>hive.aux.jars.path</name> <value>file:///usr/lib/gphd/hbase/lib/hbase-server.jar,file:///usr/lib/gphd/hbase/lib/hbase-client.jar,file:///usr/lib/gphd/hbase/lib/hbase-protocol.jar,file:///usr/lib/gphd/hbase/lib/htrace-core-2.01.jar</value> </property>
Below is the mapping of classes and their respective jars.
Class Name | Jar Name |
TableSplit | hbase-server.jar |
HTable | hbase-client.jar |
MasterProtos | hbase-protocol.jar |
org.cloudera.htrace.Trace | htrace-core-2.01.jar |