In secure environment, HBase fails with error ConnectionLoss for /hbase
search cancel

In secure environment, HBase fails with error ConnectionLoss for /hbase

book

Article ID: 294806

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

Symptoms:

HBase-master will fail to start in a secured cluster if it will not be able to connect to zookeeper, and will report an error stating ConnectionLoss for znode defined by zookeeper.znode.parent parameter in hbase-site.xml. If zookeeper.znode.parent is not specified, default /hbase is considered.

2014-04-17 08:29:02,752 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=hbase
 2014-04-17 08:29:02,752 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/hbase
 2014-04-17 08:29:02,752 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/
 2014-04-17 08:29:02,754 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=sjc1ibdhdm3.research.ge.com:2181,sjc1ibdhdm2
 .research.ge.com:2181,sjc1ibdhdm4.research.ge.com:2181 sessionTimeout=180000 watcher=master:60000
 2014-04-17 08:29:02,858 INFO org.apache.zookeeper.Login: successfully logged in.
 2014-04-17 08:29:02,860 INFO org.apache.zookeeper.Login: TGT refresh thread started.
 2014-04-17 08:29:02,864 INFO org.apache.zookeeper.client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
 2014-04-17 08:29:02,876 INFO org.apache.zookeeper.Login: TGT valid starting at: Thu Apr 17 08:29:02 PDT 2014
 2014-04-17 08:29:02,876 INFO org.apache.zookeeper.Login: TGT expires: Fri Apr 18 08:29:02 PDT 2014
 2014-04-17 08:29:02,876 INFO org.apache.zookeeper.Login: TGT refresh sleeping until: Fri Apr 18 04:32:04 PDT 2014
 2014-04-17 08:29:02,923 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is [email protected].
 com
 2014-04-17 08:29:02,947 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server sjc1ibdhdm2.research.ge.com/3.39.89.37:2181. Will a
 ttempt to SASL-authenticate using Login Context section 'Client'
 2014-04-17 08:29:02,950 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to sjc1ibdhdm2.research.ge.com/3.39.89.37:2181, initiating session
 2014-04-17 08:29:03,109 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server sjc1ibdhdm2.research.ge.com/3.39.89.37:2181, sessionid = 0x2456dd2f02d0005, negotiated timeout = 40000
 2014-04-17 08:29:03,126 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x2456dd2f02d0005, likely server has closed socket, closing socket connection and attempting reconnect
 2014-04-17 08:29:03,241 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
 2014-04-17 08:29:03,241 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 2000ms before retry #1..

Error message indicating connection loss to the znode might indicate failure while connecting to zookeeper and there can be various reason for this. If you had a working non-secured cluster and are running into issues after securing it, the primary reason can be missing configuration / envrionmental issues.

In this case, zookeeper logs were reporting the below exceptions when hbase-master was started. It indicated that there are issues with the AES encryption. JDK 7 was installed on the server, however /usr/bin/java was pointing to alternatives which was pointing to JDK 6, because of which zookeeper was starting with JDK 6 resulting in kerberos authentication failures for hbase. It primarily indicates that zookeeper was not allowing connections to clients if AES-256 was used.

 2014-04-17 11:23:40,685 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 0x14570e7551d0000
 2014-04-17 11:23:40,687 - INFO [QuorumPeer[myid=1]/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x14570e7551d0000 with negotiated timeou
 t 40000 for client /3.39.89.38:45482
 2014-04-17 11:23:40,692 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@940] - Client failed to SASL authenticate: javax.security
 .sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS
 mode with HMAC SHA1-96 is not supported/enabled)]
 2014-04-17 11:23:40,692 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@946] - Closing client connection due to SASL authentication failure.
 2014-04-17 11:23:40,693 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@997] - Closed socket connection for client /3.39.89.38:45482 which had sessionid 0x14570e7551d0000
 2014-04-17 11:23:40,693 - ERROR [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@178] - Unexpected Exception:
 java.nio.channels.CancelledKeyException
 at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
 at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
 at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1072)
  ..
 at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:234)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
 at java.lang.Thread.run(Thread.java:662)
 2014-04-17 11:23:40,694 - WARNåÊ [NIOServerCxn.Factory:0.0.0.0/0.0

Environment


Resolution

Zookeeper sources it environment using /usr/lib/gphd/zookeeper/bin/zkEnv.sh script, which sets up JAVA_HOME by sourcing /etc/gphd/zookeeper/conf/java.env. So, in order to avoid any conflicts with java envrionment variable, one must set JAVA_HOME in java.env file.

[gpadmin@sjc1ibdhdm3 conf]$ cat java.env
export JVMFLAGS="-Djava.security.auth.login.config=/etc/gphd/zookeeper/conf/jaas.conf"
export JAVA_HOME="/usr/java/default"åÊ