The Hbase region server reports the following exception:
2014-10-21 12:16:56,538 WARN [regionserver60020] regionserver.HRegionServer: error telling master we are up com.google.protobuf.ServiceException: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending local=/3.3.84.44:60635 remote=hbaseMaster.domain.com/192.168.255.40:60000] at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:5402) at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1924) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:790) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending local=/3.3.84.44:60635 remote=hbaseMaster.domain.com/192.168.255.40:60000] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:573) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:858) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1532) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1421) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650) ... 5 more
TCP sessions from the region server to the master are stuck in SYN_SENT:
[gpadmin@regionserver hbase]$ netstat -an | egrep 6000 tcp 0 1 3.3.84.44:21569 192.168.255.40:60000 SYN_SENT
From the error message, observe that the local subnet for the Hbase region server is 3.x but the remote master server subnets is 192.x
local=/3.3.84.44:60635 remote=hbaseMaster.domain.com/192.168.255.40:60000]
[gpadmin@regionserver hbase]$ netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 3.3.84.0 0.0.0.0 255.255.254.0 U 0 0 0 bond0 192.168.254.0 0.0.0.0 255.255.254.0 U 0 0 0 bond1 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 bond0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 bond1 0.0.0.0 3.3.84.1 0.0.0.0 UG 0 0 0 bond0
Set parameter hbase.regionserver.dns.interface in /etc/gphd/hbase/conf/hbase-site.xml to force the region server to use bond1 on startup. This will make it communicate with hbase master on the correct network interface.
<property> <name>hbase.regionserver.dns.interface</name> <value>bond1</value> </property>