HiveServer2 Active Directory authentication guide
search cancel

HiveServer2 Active Directory authentication guide

book

Article ID: 295080

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

This article demonstrates how to configure HiveServer2 with Active Directory (AD). 
 

Overview

Setting up HiveServer2 to authenticate against Microsoft Active Directory Domain Services (AD DS) involves the following steps:
 

1. Prerequisites:
a. Installation and basic configuration of HiveServer2.
b. Check AD DS connectivity and functionality from your PHD cluster.
c. Modify HiveServer2 security configurations related to AD DS authentication.
 
2. Start (or restart) HiveServer2 service.
 
3. Connect to HiveServer2 through beeline and validate AD DS authentication.


Environment


Resolution

Example

To illustrate how to set up HiveServer2 with AD authentication, we will use the following environment:
 

1. Single Node virtual machine of Pivotal HD (ver. 2.0.1). This will be the place where we install and configure HiveServer2 and interact with it through beeline. The hostname of this node is pivhdsne.localdomain.

2. Windows Server 2008 R2 with Microsoft Active Directory service properly installed and configured. The hostname of this node is dc1-corp-2k8.corp.gepivotal.com. We have created a domain called corp.gepivotal.com, and the following users for testing purposes (shown in Figure 1):
  • jsmith@corp.gepivotal.com, under Organizational Unit CapitalAmerica
  • dmiller@corp.gepivotal.com, under Users

When user attempts to log into HiveServer2 through beeline, actual authentication communications occur between the Pivotal HD single node VM and Windows Server 2008 R2 server running the AD service.

Prerequisites

1. Install configure the HiveServer2. Please refer to this knowledge base article for detailed instructions. Below is the basic HiveServer2 configurations in /etc/gphd/hive/conf/hive-site.xml (or Ambari Hive configs hdfs-site.xml on PHD 3.0+ or HDP).
<property>

    <name>hive.server2.thrift.port</name>

    <value>10001</value>

    <description>TCP port number to listen on, default 10000</description>

</property>



<property>

    <name>hive.support.concurrency</name>

    <description>Whether Hive supports concurrency or not. A Zookeeper instance must be up and running for the default Hive lock manager to support read-write locks.

    </description>

    <value>true</value>

</property>



<property>

    <name>hive.zookeeper.quorum</name>

    <description>Zookeeper quorum used by Hive's Table Lock Manager</description>

    <value>pivhdsne.localdomain</value>

</property>



<property>

    <name>ipc.client.connection.maxidletime</name>

    <value>10000</value>

</property>
2. Check AD DS connectivity and functionality from your PHD cluster.
# First, check AD DS connectivity

# make sure you can ping AD DS server

$ ping -c 4 dc1-corp-2k8.corp.gepivotal.com





# make sure DNS resolution is working, 192.168.9.133 is our DNS

$ dig @192.168.9.133 dc1-corp-2k8.corp.gepivotal.com

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6 <<>> @192.168.9.133 dc1-corp-2k8.corp.gepivotal.com

; (1 server found)

;; global options: +cmd

;; Got answer:

;; ->>HEADER<
3. If the bind test above succeeds, you should see output similar to this link. Pay special attention to the search result section, which should show "0 Success".

4. Before you proceed, fix any errors if the above LDAP bind test fails.


Detailed steps

1. Add the following section in /etc/gphd/hive/conf/hive-site.xml.
<property>

    <name>hive.server2.authentication</name>

    <value>LDAP</value>

</property>



<property>

    <name>hive.server2.authentication.ldap.url</name>

    <value>ldap://dc1-corp-2k8.corp.gepivotal.com</value>

</property>
2. Start or restart HiveServer2.
[pivhdsne:~]$ id
uid=500(gpadmin) gid=500(gpadmin) groups=500(gpadmin),501(hadoop)
[pivhdsne:~]$ sudo service hive-server2 start
starting hive-server2, logging to /var/log/gphd/hive/hive-server2.log [ OK ]
3. Connect to HiveServer2 through beeline and validate AD DS authentication.
[pivhdsne:~]$ id

uid=500(gpadmin) gid=500(gpadmin) groups=500(gpadmin),501(hadoop)



[pivhdsne:~]$ beeline

Beeline version 0.12.0-gphd-3.0.0.0 by Apache Hive

beeline> !connect jdbc:hive2://pivhdsne.localdomain:10001/

scan complete in 1ms

Connecting to jdbc:hive2://pivhdsne.localdomain:10001/

Enter username for jdbc:hive2://pivhdsne.localdomain:10001/: jsmith@corp.gepivotal.com

Enter password for jdbc:hive2://pivhdsne.localdomain:10001/: ********

Connected to: Hive (version 0.12.0-gphd-3.0.0.0)

Driver: Hive (version 0.12.0-gphd-3.0.0.0)

Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://pivhdsne.localdomain:10001/> show tables;

+---------------------------+

|         tab_name          |

+---------------------------+

| date_dim_hive             |

| email_addresses_dim_hive  |

+---------------------------+

2 rows selected (2.09 seconds)

0: jdbc:hive2://pivhdsne.localdomain:10001/> use retail_demo;

No rows affected (0.089 seconds)

0: jdbc:hive2://pivhdsne.localdomain:10001/> show tables;

+-----------------------+

|       tab_name        |

+-----------------------+

| order_lineitems_hive  |

| products_dim_hive     |

+-----------------------+

2 rows selected (0.186 seconds)

0: jdbc:hive2://pivhdsne.localdomain:10001/> select count(*) from order_lineitems_hive;

+----------+

|   _c0    |

+----------+

| 1024158  |

+----------+

1 row selected (28.165 seconds)

0: jdbc:hive2://pivhdsne.localdomain:10001/> !list

1 active connection:

 #0  open     jdbc:hive2://pivhdsne.localdomain:10001/

0: jdbc:hive2://pivhdsne.localdomain:10001/> !closeall

Closing: org.apache.hive.jdbc.HiveConnection

beeline> !list

No active connections

beeline> !connect jdbc:hive2://pivhdsne.localdomain:10001/

scan complete in 2ms

Connecting to jdbc:hive2://pivhdsne.localdomain:10001/

Enter username for jdbc:hive2://pivhdsne.localdomain:10001/: [email protected]

Enter password for jdbc:hive2://pivhdsne.localdomain:10001/: ********

Connected to: Hive (version 0.12.0-gphd-3.0.0.0)

Driver: Hive (version 0.12.0-gphd-3.0.0.0)

Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://pivhdsne.localdomain:10001/> show tables;

+---------------------------+

|         tab_name          |

+---------------------------+

| date_dim_hive             |

| email_addresses_dim_hive  |

+---------------------------+

2 rows selected (1.499 seconds)

0: jdbc:hive2://pivhdsne.localdomain:10001/> !list

1 active connection:

 #0  open     jdbc:hive2://pivhdsne.localdomain:10001/

​Important notes

Configurations related to Active Directory are in hive-site.xml. This page lists all possible settings relevant to Authentication or Security for HiveServer2. In our testing, we observed that setting the following two parameters will result in AD authentication failure. In Hive 0.12, the error message returned is confusing. 

Error: Invalid URL: jdbc:hive2://<HOST>:<PORT>/ (state=08S01,code=0)

hive.server2.authentication.ldap.Domain

Thus, we recommend using the following settings while trying to configure HiveServer2 with AD authentication:

  • If you set hive.server2.authentication.ldap.Domain in hive-site.xml, you can simply use your AD username in beeline to connect/authenticate to HiveServer2. Otherwise, you need to specify the fully-qualified DN in the form of username@domain.
Our example above illustrates the latter case:
  • hive.server2.authentication.ldap.baseDN
  • hive.server2.authentication
  • hive.server2.authentication.ldap.url
  • hive.server2.authentication.ldap.Domain (optional)
Multi-user permission denied error when executing SQL statements. After successfully authenticated against AD, you will be logged into the beeline interactive shell. During subsequent sessions, it might be possible that you encounter the error repoted in this JIRA https://issues.apache.org/jira/browse/HIVE-6602. A simple workaround is to manually chmod hdfs://tmp/hive-{hive.username} to 777.