Implement Rack Awareness on PHD
search cancel

Implement Rack Awareness on PHD

book

Article ID: 294822

calendar_today

Updated On:

Products

Services Suite

Environment


Resolution

In order to implement rack awareness you need two files. Create these two files and copy them to each node in the cluster to hadoop conf location: /etc/gphd/hadoop/conf/ and give respective permissions.

1. Topology script - To determine the rack location of the nodes

Below is the topology script: rack_topology.sh

#!/bin/bash

# Adjust/Add the property "net.topology.script.file.name"
# to core-site.xml with the "absolute" path the this
# file. ENSURE the file is "executable".

# Supply appropriate rack prefix
RACK_PREFIX=default

# To test, supply a hostname as script input:
if [ $# -gt 0 ]; then

CTL_FILE=${CTL_FILE:-"rack_topology.data"}

HADOOP_CONF=${HADOOP_CONF:-"/etc/hadoop/conf"}

if [ ! -f ${HADOOP_CONF}/${CTL_FILE} ]; then
echo -n "/$RACK_PREFIX/rack "
exit 0
fi

while [ $# -gt 0 ] ; do
nodeArg=$1
exec< ${HADOOP_CONF}/${CTL_FILE}
result=""
while read line ; do
ar=( $line )
if [ "${ar[0]}" = "$nodeArg" ] ; then
result="${ar[1]}"
fi
done
shift
if [ -z "$result" ] ; then
echo -n "/$RACK_PREFIX/rack "
else
echo -n "/$RACK_PREFIX/rack_$result "
fi
done

else
echo -n "/$RACK_PREFIX/rack "
fi

 

2. Topology data file - Includes rack aware node information

Sample Topology data file: rack_topology.data

# This file should be:
# - Placed in the /etc/hadoop/conf directory
# Add Hostnames to this file. Format <host ip> <rack_location

10.109.123.12 01
10.109.123.13 01
10.109.123.14 02
10.109.123.15 01

Update your core-site.xml with the below property on all the nodes

<property>
<name>net.topology.script.file.name</name>
<value>/etc/gphd/hadoop/conf/rack-topology.sh</value>
</property>

Verification

Once restarted the cluster you should be able to see a snippet from namenode logs shown below:

2015-03-03 15:21:38,729 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_01/10.109.123.12:50010
2015-03-03 15:21:38,739 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_01/10.109.123.13:50010
2015-03-03 15:21:38,801 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_02/10.109.123.14:50010
2015-03-03 15:21:38,819 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_01/10.109.123.15:50010

 


Additional Information

Environment

PHD x.x.x

Summary

There is a need for redundant solution for data locality on the nodes.åÊTo overcome such need rack awareness need to implement.

Procedure

In order to implement rack awareness you need two files. Create these two files and copy them to each node in the cluster to hadoop conf location: /etc/gphd/hadoop/conf/ and give respective permissions.

1. Topology script - To determine the rack location of the nodes

Below is the topology script: rack_topology.sh

#!/bin/bash

# Adjust/Add the property "net.topology.script.file.name"
# to core-site.xml with the "absolute" path the this
# file. ENSURE the file is "executable".

# Supply appropriate rack prefix
RACK_PREFIX=default

# To test, supply a hostname as script input:
if [ $# -gt 0 ]; then

CTL_FILE=${CTL_FILE:-"rack_topology.data"}

HADOOP_CONF=${HADOOP_CONF:-"/etc/hadoop/conf"}

if [ ! -f ${HADOOP_CONF}/${CTL_FILE} ]; then
echo -n "/$RACK_PREFIX/rack "
exit 0
fi

while [ $# -gt 0 ] ; do
nodeArg=$1
exec< ${HADOOP_CONF}/${CTL_FILE}
result=""
while read line ; do
ar=( $line )
if [ "${ar[0]}" = "$nodeArg" ] ; then
result="${ar[1]}"
fi
done
shift
if [ -z "$result" ] ; then
echo -n "/$RACK_PREFIX/rack "
else
echo -n "/$RACK_PREFIX/rack_$result "
fi
done

else
echo -n "/$RACK_PREFIX/rack "
fi

2. Topology data file - Includes rack aware node information

Sample Topology data file: rack_topology.data

# This file should be:
# - Placed in the /etc/hadoop/conf directory
# Add Hostnames to this file. Format <host ip> <rack_location

10.109.123.12 01
10.109.123.13 01
10.109.123.14 02
10.109.123.15 01

Update your core-site.xml with the below property on all the nodes

<property>
<name>net.topology.script.file.name</name>
<value>/etc/gphd/hadoop/conf/rack-topology.sh</value>
</property>

åÊVerification

Once restarted the cluster you should be able to see a snippet from namenode logs shown below:

2015-03-03 15:21:38,729 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_01/10.109.123.12:50010
2015-03-03 15:21:38,739 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_01/10.109.123.13:50010
2015-03-03 15:21:38,801 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_02/10.109.123.14:50010
2015-03-03 15:21:38,819 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default/rack_01/10.109.123.15:50010