Hadoop setup
3 servers with CentOS 7 minimal installation 64 bit.
192.168.0.100 master.example.com
192.168.0.101 slave01.example.com
192.168.0.102 slave02.example.com
Changes on all nodes.
File setup_nodes_for_hadoop.sh
#!/bin/bash
chmod 777 /hdfs
timedatectl set-timezone Europe/Oslo
yum -y install ntp rsync
systemctl start ntpd
systemctl enable ntpd
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
systemctl disable firewalld
systemctl stop firewalld
setenforce 0
rpm -ivh /root/jdk-8u65-linux-x64.rpm
Uploade files:
scp setup_nodes_for_hadoop.sh root@master.example.com:/root/
scp setup_nodes_for_hadoop.sh root@slave01.example.com:/root/
scp setup_nodes_for_hadoop.sh root@slave02.example.com:/root/
scp jdk-8u65-linux-x64.rpm root@master.example.com:/root/
scp jdk-8u65-linux-x64.rpm root@slave01.example.com:/root/
scp jdk-8u65-linux-x64.rpm root@slave02.example.com:/root/
Run script:
ssh root@master.example.com 'bash setup_nodes_for_hadoop.sh'
ssh root@slave01.example.com 'bash setup_nodes_for_hadoop.sh'
ssh root@slave02.example.com 'bash setup_nodes_for_hadoop.sh'
Set hostnames:
hostnamectl set-hostname master
hostnamectl set-hostname slave01
hostnamectl set-hostname slave02
Changes on master before rsync
As root user:
cat <<EOT >> /etc/hosts
192.168.1.200 master.example.com
192.168.1.201 slave01.example.com
192.168.1.202 slave02.example.com
EOT
Create user hadoop, create rsa key and get hadoop binaries.
useradd hadoop
passwd hadoop
su - hadoop
ssh-keygen
curl -O http://www.eu.apache.org/dist/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
tar xzf hadoop-2.7.1.tar.gz
mv hadoop-2.7.1 hadoop
Continue as hadoop user. Set environment.
cat <<'EOT' >> .bashrc
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
EOT
source .bashrc
Update file hadoop/etc/hadoop/hadoop-env.sh
sed -i 's/export JAVA_HOME=.*/export JAVA_HOME=\/usr\/java\/jdk1.8.0_65\//g' hadoop/etc/hadoop/hadoop-env.sh
Overwrite core-site.xml
cat <<'EOT' > hadoop/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master.example.com:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
EOT
Overwrite hdfs-site.xml
cat <<'EOT' > hadoop/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hdfs/namenode</value>
</property>
<property>
<name>dfs.http.address</name>
<value>master.example.com:50070</value>
</property>
</configuration>
EOT
Overwrite mapred-site.xml
cat <<'EOT' > hadoop/etc/hadoop/mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>master.example.com:54311</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
EOT
Overwrite yarn-site.xml
cat <<'EOT' > hadoop/etc/hadoop/yarn-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master.example.com:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master.example.com:8032</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master.example.com:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master.example.com:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master.example.com:8033</value>
</property>
</configuration>
EOT
Overwrite slaves
cat <<'EOT' > hadoop/etc/hadoop/slaves
master.example.com
slave01.example.com
slave02.example.com
EOT
Rsync files across nodes
As root user:
rsync -e ssh -a /etc/passwd /etc/shadow /etc/group slave01.example.com:/etc/
rsync -e ssh -a /etc/passwd /etc/shadow /etc/group slave02.example.com:/etc/
rsync -e ssh -a /home/hadoop slave02.example.com:/home/
rsync -e ssh -a /home/hadoop slave01.example.com:/home/
ssh to all nodes and to master to add known hosts
As hadoop user:
su - hadoop
[hadoop@master ~]$ ssh-copy-id master.example.com
[hadoop@master ~]$ ssh master.example.com
[hadoop@master ~]$ ssh slave01.example.com
[hadoop@master ~]$ ssh slave02.example.com
Format filesystem and run hadoop services
From master:
hdfs namenode -format
Run hdfs and yarn:
start-dfs.sh
start-yarn.sh
jps
Ignore:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Available links:
- http://master.example.com:8088 – Resource Manager
- http://master.example.com:50070 – NameNode
- http://master.example.com:50075 – DataNode
Run test 1 job:
hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 30 100
Run test 2 job:
vi text.txt
hdfs dfs -mkdir /test
hdfs dfs -copyFromLocal text.txt /test
hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /test/text.txt /output01
hdfs dfs -ls /output01
hdfs dfs -cat /output01/part-r-00000
Stop hadoop:
stop-dfs.sh
stop-yarn.sh
jps