Install Hadoop on multiple nodes using Ubuntu 15.10

Preface

We give a brief introduction of Hadoop in previous tutorial, for today we will learn to install Hadoop on multiple nodes, in demonstration scenario  we will be using Ubuntu 15.10 as Desktop, we will create 2 Slave or Data Nodes along with 1 Name node. Make sure you have shared ssh public keys with Data nodes and assign appropriate IP addresses, host name and other Hadoop  services (we will mention in tutorial) required to run Hadoop multiple cluster node.

Prerequisites

we will be using Ubuntu 15.10 as 1 master node, 2 Slave/data nodes. hostname for namenode  will be masternode, datanodes  will have hostname slave1 and  slave2 respectively.

masternode IP address:192.51.10.10

Slave1 IP Address:192.51.10.11

Slave2 IP Address:192.51.10.12

Configuration

Instillation process is similar to previous tutorial except few changes. First of all let us configure master node .

Define hostname of Namenode

# vim /etc/hostname

Selection_011

 

Define hosts in /etc/hosts file

# vim /etc/hosts

Sample output

127.0.0.1       localhost
192.51.10.10    masternode
192.51.10.11    slave1
192.51.10.12    slave2

Configure Hadoop Services

# cd /usr/local/hadoop/etc/hadoop/

Edit hdfs-site.xml

# vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml

File will look like below, change replication value to 3.

<configuration>
<property>
 <name>dfs.replication</name>
 <value>3</value>
</property>
<property>
  <name>dfs.namenode.name.dir</name>
  <value>file:///usr/local/hadoop/hadoopdata/hdfs/namenode</value>
</property>
</configuration>

Make sure that you possess a namenode directory under /usr/local/hadoop

# mkdir -p /usr/local/hadoop/hadoopdata/hdfs/namenode
# sudo chown -R hadoop:hadoop /usr/local/hadoop/

Similarly edit yarn-site.xml, it  will look like below, make sure you have assigned hostname of masternode appropriately

# vim yarn-site.xml

Sample output

<configuration>
<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
</property>
<property>
 <name>yarn.resourcemanager.scheduler.address</name>
 <value>masternode:8030</value>
</property>
<property>
 <name>yarn.resourcemanager.address</name>
 <value>masternode:8032</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address</name>
  <value>masternode:8088</value>
</property>
<property>
  <name>yarn.resourcemanager.resource-tracker.address</name>
  <value>masternode:8031</value>
</property>
<property>
  <name>yarn.resourcemanager.admin.address</name>
  <value>masternode:8033</value>
</property>
</configuration>

Make sure core-site.xml have appropriate hostname

Selection_012

 

Create a file named slaves under /usr/local/hadoop/etc/hadoop directory  and assign hostnames of datanodes

# vim /usr/local/hadoop/etc/hadoop/slaves

Put following entries

slave1
slave2

Similarly create file named mastersunder same directory hierarchy

# vim /usr/local/hadoop/etc/hadoop/masters

Enter following

masternode

We have a working master node at this stage, let us create 2 slave nodes. We created two  clone virtual machines using VirtualBox, first clone is slave1 and second cone is slave2, as this machine is clone of Masternode so we will be having all of the hadoop configuration files (.xml) in ready to use form.

Selection_003

Similarity create another clone for slave2 datanode.

Change IP address to 192.51.10.11

Selection_013

Change hostname to slave1 and reboot the system. Replete the process for another VirtualBox Clone which will be used as slave2,assign IP address 192.51.10.12 to slave2.

Name we have one NameNode (masternode) with IP address 192.51.10.10 and two datanodes (slave1, slave2).

Selection_010

Now switch back to master node and share ssh rsa keys with slave1 and slave2, so that there is no need for ssh passwords.

# ssh-keygen -t rsa
# ssh hadoop@192.51.10.11 "chmod 755 .ssh; chmod 640 .ssh/authorized_keys"

# cat .ssh/id_rsa.pub | ssh hadoop@192.51.10.12 'cat >> .ssh/authorized_keys'

# ssh hadoop@192.51.10.12 "chmod 755 .ssh; chmod 640 .ssh/authorized_keys"

Reboot all three systems to make sure all things are going smooth.

Edit hdfs-site.xml file of slave1 and slave2 data nodes make sure you have following entries

<configuration>
<configuration>
<property>
  <name>dfs.data.dir</name>
   <value>file:///usr/local/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>

Create  /usr/local/hadoop/hadoopdata/hdfs/datanode directory  on both data nodes

# mkdir -p /usr/local/hadoop/hadoopdata/hdfs/datanode
# chown -R hadoop:hadoop /usr/local/hadoop/

Go to Masternode and run start node services

# cd /usr/local/hadoop/sbin && ls

Selection_014

Run all node services

# ./start-all.sh

Selection_015

We can see that both of datanodes (slave1, slave2) are working properly.

Run jps command on  Masternode

# jps

Sample output

8499 SecondaryNameNode
8922 Jps
8650 ResourceManager

Swith to Slave1 and run jps command again

# ssh hadoop@slave1

# jps

Sample output, datanode is working

4373 DataNode
4499 NodeManager
4671 Jps

Similarly in slave2 datanode is working perfectly

Selection_016

Multinode  Hadoop Cluster installation process is over at that  stage.

Open browser and type

http://192.51.10.10:8088/cluster/nodes <change IP addr in your scenario>

Taht Selection_017

Thats it! Have Fun!!