Hbase RegionServers fail to come up in crash recover with an immutable configuration error.
2015-07-02 06:05:02,273 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=-ROOT-,,0.70236052, starting to roll back the global memstore size.
java.io.IOException: Cannot get log reader
at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:721)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:3179)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3128)
at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:631)
at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:547)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4399)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4347)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:101)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.UnsupportedOperationException: Immutable Configuration
at org.apache.hadoop.hbase.regionserver.CompoundConfiguration.setClass(CompoundConfiguration.java:445)
at org.apache.hadoop.ipc.RPC.setProtocolEngine(RPC.java:193)
at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:249)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:168)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:418)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:385)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:123)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2277)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:314)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1747)
at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1773)
at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:55)
at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:177)
at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:715)
... 12 more
The Hbase RegionServer recover passes the CompoundConfiguration conf object to a Hadoop client. Then the org.apache.hadoop.ipc.RPC.setProtocolEngine attempts to modify the conf data structor using setClass which is overridden. As a result an immutable exception error is produced.
191 public static void setProtocolEngine(Configuration conf,
192 Class<?> protocol, Class<?> engine) {
193 conf.setClass(ENGINE_PROP+"."+protocol.getName(), engine, RpcEngine.class);
194 }
The above condition is only true when HDFS HA is not enabled. In the HA case the CompoundConfiguration object gets copied into a new Configuration object class. This results in a mutable configuration object which gets passed down to the HDFS client org.apache.hadoop.ipc.RPC.setProtocolEngine.
132 // HA case 133 FailoverProxyProvider failoverProxyProvider = NameNodeProxies 134 .createFailoverProxyProvider(conf, failoverProxyProviderClass, xface, 135 nameNodeUri); 136 Conf config = new Conf(conf);
With that in mind a proven workaround in this case is to enable HDFS HA for all the Hbase RegionServers and Hbase Master services only. This will trick Hbase into thinking HA is enabled, even though there is a single NameNode in the environment. This allows Hbase RegionServers to successfully get out of recovery mode. Upon successful recovery, the HA related configuration settings can be removed by following the steps below:
<property>
<name>dfs.nameservices</name>
<value>${nameservices}</value>
</property>
<property>
<name>dfs.ha.namenodes.${nameservices}</name>
<value>${namenode1id},${namenode2id}</value>
</property>
<property>
<name>dfs.namenode.rpc-address.${nameservices}.${namenode1id}</name>
<value>${namenode}:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.${nameservices}.${namenode2id}</name>
<value>${standbynamenode}:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.${nameservices}.${namenode1id}</name>
<value>${namenode}:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.${nameservices}.${namenode2id}</name>
<value>${standbynamenode}:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://${journalnode}/${nameservices}</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.${nameservices}</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hdfs/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>${journalpath}</value>
</property>
<!-- Namenode Auto HA related properties -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- END Namenode Auto HA related properties -->
3. Edit the /etc/gphd/hadoop/conf/core-site.xml file:
<property>
<name>fs.defaultFS</name>
<value>hdfs://${nameservices}</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>${zookeeper-server}:${zookeeper.client.port}</value>
</property>
4. Edit the /etc/gphd/hadoop/conf/yarn-site.xml file:
<property>
<name>mapreduce.job.hdfs-servers</name>
<value>hdfs://${nameservices}</value>
</property>
5. Edit the /etc/gphd/hbase/conf/hbase-site.xml file:
<property>
<name>hbase.rootdir</name>
<value>hdfs://${nameservices}/apps/hbase/data</value>
<description>The directory shared by region servers and into
which HBase persists. The URL should be 'fully-qualified'
to include the filesystem scheme. For example, to specify the
HDFS directory '/hbase' where the HDFS instance's namenode is
running at namenode.example.org on port 9000, set this value to:
hdfs://namenode.example.org:9000/hbase. By default HBase writes
into /tmp. Change this configuration else all data will be lost
on machine restart.
</description>
</property>
6. Distribute the configuration changes to all Hbase Master and RegionServer nodes.Upgrade to PHD 3.0 which includes HBASE version 0.98.4 or permanently enable HDFS HA to stop this issue from occurring in the future.