Hadoop setup on CentOS 7

Posted on Wed 04 November 2015 by Pavlo Khmel

Hadoop setup

3 servers with CentOS 7 minimal installation 64 bit.

192.168.0.100 master.example.com
192.168.0.101 slave01.example.com
192.168.0.102 slave02.example.com

Changes on all nodes.

File setup_nodes_for_hadoop.sh

#!/bin/bash
chmod 777 /hdfs
timedatectl set-timezone Europe/Oslo
yum -y install ntp rsync
systemctl start ntpd
systemctl enable ntpd
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
systemctl disable firewalld
systemctl stop firewalld
setenforce 0
rpm -ivh /root/jdk-8u65-linux-x64.rpm

Uploade files:

scp setup_nodes_for_hadoop.sh root@master.example.com:/root/
scp setup_nodes_for_hadoop.sh root@slave01.example.com:/root/
scp setup_nodes_for_hadoop.sh root@slave02.example.com:/root/
scp jdk-8u65-linux-x64.rpm root@master.example.com:/root/
scp jdk-8u65-linux-x64.rpm root@slave01.example.com:/root/
scp jdk-8u65-linux-x64.rpm root@slave02.example.com:/root/

Run script:

ssh root@master.example.com 'bash setup_nodes_for_hadoop.sh'
ssh root@slave01.example.com 'bash setup_nodes_for_hadoop.sh'
ssh root@slave02.example.com 'bash setup_nodes_for_hadoop.sh'

Set hostnames:

hostnamectl set-hostname master
hostnamectl set-hostname slave01
hostnamectl set-hostname slave02

Changes on master before rsync

As root user:

cat <<EOT >> /etc/hosts
192.168.1.200 master.example.com
192.168.1.201 slave01.example.com
192.168.1.202 slave02.example.com
EOT

Create user hadoop, create rsa key and get hadoop binaries.

useradd hadoop
passwd hadoop
su - hadoop
ssh-keygen
curl -O http://www.eu.apache.org/dist/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
tar xzf hadoop-2.7.1.tar.gz
mv hadoop-2.7.1 hadoop

Continue as hadoop user. Set environment.

cat <<'EOT' >> .bashrc
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
EOT
source .bashrc

Update file hadoop/etc/hadoop/hadoop-env.sh

sed -i 's/export JAVA_HOME=.*/export JAVA_HOME=\/usr\/java\/jdk1.8.0_65\//g' hadoop/etc/hadoop/hadoop-env.sh

Overwrite core-site.xml

cat <<'EOT' > hadoop/etc/hadoop/core-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>fs.default.name</name>
      <value>hdfs://master.example.com:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/tmp</value>
  </property>
</configuration>
EOT

Overwrite hdfs-site.xml

cat <<'EOT' > hadoop/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>/hdfs/datanode</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>/hdfs/namenode</value>
  </property>
  <property>
    <name>dfs.http.address</name>
    <value>master.example.com:50070</value>
  </property>
</configuration>
EOT

Overwrite mapred-site.xml

cat <<'EOT' > hadoop/etc/hadoop/mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>mapreduce.jobtracker.address</name>
    <value>master.example.com:54311</value>
  </property>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>
EOT

Overwrite yarn-site.xml

cat <<'EOT' > hadoop/etc/hadoop/yarn-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>master.example.com:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>master.example.com:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>master.example.com:8088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>master.example.com:8031</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>master.example.com:8033</value>
  </property>
</configuration>
EOT

Overwrite slaves

cat <<'EOT' > hadoop/etc/hadoop/slaves
master.example.com
slave01.example.com
slave02.example.com
EOT

Rsync files across nodes

As root user:

rsync -e ssh -a /etc/passwd /etc/shadow /etc/group slave01.example.com:/etc/
rsync -e ssh -a /etc/passwd /etc/shadow /etc/group slave02.example.com:/etc/
rsync -e ssh -a /home/hadoop slave02.example.com:/home/
rsync -e ssh -a /home/hadoop slave01.example.com:/home/

ssh to all nodes and to master to add known hosts

As hadoop user:

su - hadoop
[hadoop@master ~]$ ssh-copy-id master.example.com
[hadoop@master ~]$ ssh master.example.com
[hadoop@master ~]$ ssh slave01.example.com
[hadoop@master ~]$ ssh slave02.example.com

Format filesystem and run hadoop services

From master:

hdfs namenode -format

Run hdfs and yarn:

start-dfs.sh
start-yarn.sh
jps

Ignore:

 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Available links:

  • http://master.example.com:8088 – Resource Manager
  • http://master.example.com:50070 – NameNode
  • http://master.example.com:50075 – DataNode

Run test 1 job:

hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 30 100

Run test 2 job:

vi text.txt
hdfs dfs -mkdir /test
hdfs dfs -copyFromLocal text.txt /test
hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /test/text.txt /output01
hdfs dfs -ls /output01
hdfs dfs -cat /output01/part-r-00000

Stop hadoop:

stop-dfs.sh
stop-yarn.sh
jps