This section provides a general overview of what HiveServer2 is and how it works.
HiveServer2 is a server interface that enables remote clients to execute queries against HiveServer1 and retrieve the results. HiveServer2 accesses HiveServer1 data without alteration, provided you are not changing your HiveServer1 release. You do not need to update or transform data in order to begin using HiveServer2. You can use HiveServer2 instead of HiveServer1 as soon as you have enabled support.
HiveServer2 provides support for the following features:
Perform the following procedure to configure HiveServer2:
Ideally, you can install HiveServer2 on any of the nodes in your Pivotal HD Cluster but a master service node is recommended. If HiveServer1 is already installed, you may use the same node to run HiveServer2, since HiveServer1 will be deprecated and eventually, you will discontinue its use. You must, however, turn off HiveServer1 before you begin using HiveServer2.
Since your existing PHD cluster is setup using Pivotal HD Install & Configuration Manager utility (ICM), /etc/yum.repos.d/gphd.repo will be already available on the cluster node and you can use yum to install HiveServer2.
[gpadmin@hdw3 ~]$ sudo yum install hive-server2 Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile Dependencies Resolved .. .. Installing: hive-server2 noarch 0 .11.0_gphd_2_1_1_0-83 gphd-admin-localrepo 5.1 k .. .. Installed: hive-server2.noarch 0:0.11.0_gphd_2_1_1_0-83 Complete! Note: If yum is not working. Please install hive-server2 package using rpm command. Ex: rpm -ivh <path to PHD binary>/hive/rpm/hive-server2-x.x.x_gphd_x_x_x_x-x.noarch.rpm3. Modify the hive-site.xml file, in the following location: /etc/gphd/hive/conf/hive-site.xml
4. By default, HiveServer1 and Hiveserver2 try to bind at port 10000. Ensure that you have turned off HiveServer1 before starting HiveServer2.
NOTE: There is no need to change the port but if necessary, you can change the port number using the hive.server2.thrift.port parameter.
<property> <name>hive.server2.thrift.port</name> <value>10001</value> <description>TCP port number to listen on, default 10000</description> </property>5. You must set the following parameters to avoid any corruption.
<property> <name>hive.support.concurrency</name> <description>Whether Hive supports concurrency or not. A Zookeeper instance must be up and running for the default Hive lock manager to support read-write locks. </description> <value>true</value> </property> <property> <name>hive.zookeeper.quorum</name> <description>Zookeeper quorum used by Hive's Table Lock Manager</description> <value>hdw1.hadoop.local,hdw2.hadoop.local,hdw3.hadoop.local</value> </property>
With the above parameters configured, Hive automatically starts acquiring locks for certain queries. You can review all current lock queries using the SHOW LOCKS <TABLE_NAME>; command.
There are 2 types of locks supported by Hive, and they are enabled automatically when concurrency is enabled:
Shared Lock: A shared lock is acquired when a table is read. Multiple, concurrent shared locks are allowed.
Exclusive Locks: An exclusive lock is required for all operations that modify the table is some way. They not only freeze other table mutating operations, they also prevent queries by other processes.
When the table is partitioned, acquiring an exclusive lock on a partition causes shared lock to be acquired on the table itself to prevent incompatible concurrent changes from occurring, such as attempting to drop the table while a partition is being modified.
To prevent hive server from opening too many connections with namenode, we need to set ipc.client.connection.maxidletime to the default value of 10 seconds. By default, PHD will set this parameter to 1 hour in the core-site.xml which can cause out of memory errors on HiveServer2.
<property> <name>ipc.client.connection.maxidletime</name> <value>10000</value> </property>1. Setup is complete, start HiveServer2
[gpadmin@hdw3 conf]$ sudo service hive-server2 start starting hive-server2, logging to /var/log/gphd/hive/hive-server2.log [ ]2. Connect to HiveServer2 using the beeline utility.
NOTE: In this example, hdw3 is the server where hive-server2 is installed & 10001 is the port defined for hive-server2, so change it appropriately to suit your installation.
[gpadmin@hdm1 init.d]$ beeline Beeline version 0.11.0-gphd-2.1.1.0 by Apache Hive beeline> !connect jdbc:hive2://hdw3:10000 username password org.apache.hive.jdbc.HiveDriver Connecting to jdbc:hive2://hdw3:10001 Connected to: Hive (version 0.11.0-gphd-2.1.1.0) Driver: Hive (version 0.11.0-gphd-2.1.1.0) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://hdw3:10000> show tables . . . . . . . . . . . . . > ; +------------+ |tab_name| +------------+ | passwords| +------------+ 1 row selected (1.487 seconds) 0: jdbc:hive2://hdw3:10000>3. Connect to HiveServer2 on a secured cluster
[gpadmin@hdm1 init.d]$ beeline 2: jdbc:hive2://hdm1:10001> !connect jdbc:hive2://hdm1:10001/default;principal=hive/[email protected] Connecting to jdbc:hive2://hdm1:10001/default;principal=hive/[email protected] Enter username for jdbc:hive2://hdm1:10001/default;principal=hive/[email protected]: gpadmin Enter password for jdbc:hive2://hdm1:10001/default;principal=hive/[email protected]: ******* Connected to: Hive (version 0.12.0-gphd-3.0.1.0) Driver: Hive (version 0.12.0-gphd-3.0.1.0) Transaction isolation: TRANSACTION_REPEATABLE_READ 3: jdbc:hive2://hdm1:10001/default> show tables; +------------+ | tab_name | +------------+ | book | | book1 | | book10 |
NOTE: If you don't have privileges or the cluster has been secured, you will not be able to perform any read / write operations and may receive an error similar to the following. In that case, you will need to grant appropriate privileges.
3: jdbc:hive2://hdm1:10001/default> select * from book; Error: Error while processing statement: Authorization failed:No privilege 'Select' found for inputs { database:default, table:book, columnName:word}. Use show grant to get more details. (state=,code=403) 0: jdbc:hive2://hdm1.gphd.local:10001/default> grant select on table book to user gpadmin ; 0: jdbc:hive2://hdm1.gphd.local:10001> select * from book; +-------+ | word | +-------+ +-------+
Miscellaneous: