Hbase regionserver fails with error telling master we are up
search cancel

Hbase regionserver fails with error telling master we are up

book

Article ID: 294998

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

Symptoms:

The Hbase region server reports the following exception:

2014-10-21 12:16:56,538 WARN  [regionserver60020] regionserver.HRegionServer: error telling master we are up
com.google.protobuf.ServiceException: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending local=/3.3.84.44:60635 remote=hbaseMaster.domain.com/192.168.255.40:60000]
        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
        at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
        at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:5402)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1924)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:790)
        at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending local=/3.3.84.44:60635 remote=hbaseMaster.domain.com/192.168.255.40:60000]
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
        at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:573)
        at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:858)
        at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1532)
        at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1421)
        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
        ... 5 more

TCP sessions from the region server to the master are stuck in SYN_SENT:

[gpadmin@regionserver hbase]$ netstat -an | egrep 6000
tcp        0      1 3.3.84.44:21569             192.168.255.40:60000        SYN_SENT

Environment


Cause

From the error message, observe that the local subnet for the Hbase region server is 3.x but the remote master server subnets is 192.x

local=/3.3.84.44:60635 remote=hbaseMaster.domain.com/192.168.255.40:60000]
This happens when you have multiple network interfaces assigned to your servers. By default the region server will identify its primary hostname and perform a DNS lookup to determine what interface it should bond to. Based on the following network routing table the region server determines it needs to vind to the bond0 interface, even though the master server is communicating on bond1.
[gpadmin@regionserver hbase]$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
3.3.84.0        0.0.0.0         255.255.254.0   U         0 0          0 bond0
192.168.254.0   0.0.0.0         255.255.254.0   U         0 0          0 bond1
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 bond0
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 bond1
0.0.0.0         3.3.84.1        0.0.0.0         UG        0 0          0 bond0

Resolution

Set parameter hbase.regionserver.dns.interface in /etc/gphd/hbase/conf/hbase-site.xml to force the region server to use bond1 on startup. This will make it communicate with hbase master on the correct network interface.

<property>
    <name>hbase.regionserver.dns.interface</name>
    <value>bond1</value>
</property>