Hbase RegionServers fail to come up in crash recover with an immutable configuration error.
2015-07-02 06:05:02,273 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=-ROOT-,,0.70236052, starting to roll back the global memstore size. java.io.IOException: Cannot get log reader at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:721) at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:3179) at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3128) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:631) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:547) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4399) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4347) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:101) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.UnsupportedOperationException: Immutable Configuration at org.apache.hadoop.hbase.regionserver.CompoundConfiguration.setClass(CompoundConfiguration.java:445) at org.apache.hadoop.ipc.RPC.setProtocolEngine(RPC.java:193) at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:249) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:168) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:418) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:385) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:123) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2277) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:314) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1747) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1773) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:55) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:177) at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:715) ... 12 more
The Hbase RegionServer recover passes the CompoundConfiguration conf object to a Hadoop client. Then the org.apache.hadoop.ipc.RPC.setProtocolEngine attempts to modify the conf data structor using setClass which is overridden. As a result an immutable exception error is produced.
191 public static void setProtocolEngine(Configuration conf, 192 Class<?> protocol, Class<?> engine) { 193 conf.setClass(ENGINE_PROP+"."+protocol.getName(), engine, RpcEngine.class); 194 }
The above condition is only true when HDFS HA is not enabled. In the HA case the CompoundConfiguration object gets copied into a new Configuration object class. This results in a mutable configuration object which gets passed down to the HDFS client org.apache.hadoop.ipc.RPC.setProtocolEngine.
132 // HA case 133 FailoverProxyProvider failoverProxyProvider = NameNodeProxies 134 .createFailoverProxyProvider(conf, failoverProxyProviderClass, xface, 135 nameNodeUri); 136 Conf config = new Conf(conf);
With that in mind a proven workaround in this case is to enable HDFS HA for all the Hbase RegionServers and Hbase Master services only. This will trick Hbase into thinking HA is enabled, even though there is a single NameNode in the environment. This allows Hbase RegionServers to successfully get out of recovery mode. Upon successful recovery, the HA related configuration settings can be removed by following the steps below:
<property> <name>dfs.nameservices</name> <value>${nameservices}</value> </property> <property> <name>dfs.ha.namenodes.${nameservices}</name> <value>${namenode1id},${namenode2id}</value> </property> <property> <name>dfs.namenode.rpc-address.${nameservices}.${namenode1id}</name> <value>${namenode}:8020</value> </property> <property> <name>dfs.namenode.rpc-address.${nameservices}.${namenode2id}</name> <value>${standbynamenode}:8020</value> </property> <property> <name>dfs.namenode.http-address.${nameservices}.${namenode1id}</name> <value>${namenode}:50070</value> </property> <property> <name>dfs.namenode.http-address.${nameservices}.${namenode2id}</name> <value>${standbynamenode}:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://${journalnode}/${nameservices}</value> </property> <property> <name>dfs.client.failover.proxy.provider.${nameservices}</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hdfs/.ssh/id_rsa</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>${journalpath}</value> </property> <!-- Namenode Auto HA related properties --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- END Namenode Auto HA related properties -->3. Edit the /etc/gphd/hadoop/conf/core-site.xml file:
<property> <name>fs.defaultFS</name> <value>hdfs://${nameservices}</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> <property> <name>ha.zookeeper.quorum</name> <value>${zookeeper-server}:${zookeeper.client.port}</value> </property>4. Edit the /etc/gphd/hadoop/conf/yarn-site.xml file:
<property> <name>mapreduce.job.hdfs-servers</name> <value>hdfs://${nameservices}</value> </property>5. Edit the /etc/gphd/hbase/conf/hbase-site.xml file:
<property> <name>hbase.rootdir</name> <value>hdfs://${nameservices}/apps/hbase/data</value> <description>The directory shared by region servers and into which HBase persists. The URL should be 'fully-qualified' to include the filesystem scheme. For example, to specify the HDFS directory '/hbase' where the HDFS instance's namenode is running at namenode.example.org on port 9000, set this value to: hdfs://namenode.example.org:9000/hbase. By default HBase writes into /tmp. Change this configuration else all data will be lost on machine restart. </description> </property>6. Distribute the configuration changes to all Hbase Master and RegionServer nodes.
Upgrade to PHD 3.0 which includes HBASE version 0.98.4 or permanently enable HDFS HA to stop this issue from occurring in the future.