In order to implement rack awareness you need two files. Create these two files and copy them to each node in the cluster to hadoop conf location: /etc/gphd/hadoop/conf/ and give respective permissions.
1. Topology script - To determine the rack location of the nodes
Below is the topology script: rack_topology.sh
#!/bin/bash # Adjust/Add the property "net.topology.script.file.name" # to core-site.xml with the "absolute" path the this # file. ENSURE the file is "executable". # Supply appropriate rack prefix RACK_PREFIX=default # To test, supply a hostname as script input: if [ $# -gt 0 ]; then CTL_FILE=${CTL_FILE:-"rack_topology.data"} HADOOP_CONF=${HADOOP_CONF:-"/etc/hadoop/conf"} if [ ! -f ${HADOOP_CONF}/${CTL_FILE} ]; then echo -n "/$RACK_PREFIX/rack " exit 0 fi while [ $# -gt 0 ] ; do nodeArg=$1 exec< ${HADOOP_CONF}/${CTL_FILE} result="" while read line ; do ar=( $line ) if [ "${ar[0]}" = "$nodeArg" ] ; then result="${ar[1]}" fi done shift if [ -z "$result" ] ; then echo -n "/$RACK_PREFIX/rack " else echo -n "/$RACK_PREFIX/rack_$result " fi done else echo -n "/$RACK_PREFIX/rack " fi
2. Topology data file - Includes rack aware node information
Sample Topology data file: rack_topology.data
# This file should be:
# - Placed in the /etc/hadoop/conf directory
# Add Hostnames to this file. Format <host ip> <rack_location
10.109.123.12 01
10.109.123.13 01
10.109.123.14 02
10.109.123.15 01
Update your core-site.xml with the below property on all the nodes
<property>
<name>net.topology.script.file.name</name>
<value>/etc/gphd/hadoop/conf/rack-topology.sh</value>
</property>
Verification
Once restarted the cluster you should be able to see a snippet from namenode logs shown below:
2015-03-03 15:21:38,729 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_01/10.109.123.12:50010 2015-03-03 15:21:38,739 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_01/10.109.123.13:50010 2015-03-03 15:21:38,801 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_02/10.109.123.14:50010 2015-03-03 15:21:38,819 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_01/10.109.123.15:50010
Environment
PHD x.x.x
Summary
There is a need for redundant solution for data locality on the nodes.åÊTo overcome such need rack awareness need to implement.
Procedure
In order to implement rack awareness you need two files. Create these two files and copy them to each node in the cluster to hadoop conf location: /etc/gphd/hadoop/conf/ and give respective permissions.
1. Topology script - To determine the rack location of the nodes
Below is the topology script: rack_topology.sh
#!/bin/bash
# Adjust/Add the property "net.topology.script.file.name"
# to core-site.xml with the "absolute" path the this
# file. ENSURE the file is "executable".
# Supply appropriate rack prefix
RACK_PREFIX=default
# To test, supply a hostname as script input:
if [ $# -gt 0 ]; then
CTL_FILE=${CTL_FILE:-"rack_topology.data"}
HADOOP_CONF=${HADOOP_CONF:-"/etc/hadoop/conf"}
if [ ! -f ${HADOOP_CONF}/${CTL_FILE} ]; then
echo -n "/$RACK_PREFIX/rack "
exit 0
fi
while [ $# -gt 0 ] ; do
nodeArg=$1
exec< ${HADOOP_CONF}/${CTL_FILE}
result=""
while read line ; do
ar=( $line )
if [ "${ar[0]}" = "$nodeArg" ] ; then
result="${ar[1]}"
fi
done
shift
if [ -z "$result" ] ; then
echo -n "/$RACK_PREFIX/rack "
else
echo -n "/$RACK_PREFIX/rack_$result "
fi
done
else
echo -n "/$RACK_PREFIX/rack "
fi
2. Topology data file - Includes rack aware node information
Sample Topology data file: rack_topology.data
# This file should be:
# - Placed in the /etc/hadoop/conf directory
# Add Hostnames to this file. Format <host ip> <rack_location
10.109.123.12 01
10.109.123.13 01
10.109.123.14 02
10.109.123.15 01
Update your core-site.xml with the below property on all the nodes
<property>
<name>net.topology.script.file.name</name>
<value>/etc/gphd/hadoop/conf/rack-topology.sh</value>
</property>
åÊVerification
Once restarted the cluster you should be able to see a snippet from namenode logs shown below:
2015-03-03 15:21:38,729 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_01/10.109.123.12:50010 2015-03-03 15:21:38,739 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_01/10.109.123.13:50010 2015-03-03 15:21:38,801 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_02/10.109.123.14:50010 2015-03-03 15:21:38,819 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_01/10.109.123.15:50010