Saturday 5 March 2016

Hadoop Multi Node Cluster Installation

Multi node Installation involves one master and many slaves. For building this setup we start by initially installing a single node cluster in each node test them and then merging them with required settings to make one node as master and other node as slaves. It is much easier to track down any problems one might encounter due to reduced complexity of a single node cluster setup on each machine.

Step 1: Configuring Single Node Cluster for each node

Download and configure single node cluster in each node on this cluster using http://rachana706.blogspot.in/2015/05/first-step-with-apache-hadoop.html

Step 2: Networking

It is easier to put all the machine on same network with regards to hardware and software configuration, Here we take three machine for exampleconnect the both machine via a single hub or switch and configure the networkinterfaces to use a common network such as 192.168.0.x/24. We need to update the IP address of each node (master node and slaves) in file /etc/hosts.

$nano /etc/hosts
Add the following lines to this file:
192.168.1.100 master
192.168.1.101 slave1
192.168.1.102 slave2

Step 3: SSH Access

Master node must be able to connect to itself and to all slaves in order to start-services and doing cluster management tasks. For this we need to add masters ssh key to authorized_ key file of all the slaves.

hduser@master:$ssh-copy-id –i HOME/.ssh/id_rsa.pub hduser@slave1
hduser@master:$ssh-copy-id –i HOME/.ssh/id_rsa.pub hduser@slave2

Step 4: Test SSH connection

We can test our configuration by connecting Master node with a ssh session to slave node and to itself.

hduser@master:$ssh master
hduser@master:$ssh slave

Step 6: Configuring Files For All Nodes

We can test our configuration by connecting Master node with a ssh session to slave node and to itself.
  • Open conf/core-site.xml, and modify file.
    $nano conf/core-site.xml
    Add between <configuration></configuration> tags:
    <property>
    <name>fs.default.name</name>
    <value>hdfs://master:9000</value>
    </property> 
  • Open conf/yarn-site.xml and Add following properties to this file.
    $nano conf/yarn-site.xml
    Add between <configuration></configuration> tags:
    <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>Hadoopmaster:8025</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>Hadoopmaster:8030</value>
    </property>
    
    <property>
    <name>yarn.resourcemanager.address</name>
    <value>Hadoopmaster:8050</value>
    </property>
     
  • Open conf/hdfs-site.xml and replace 1 with 3 in replication property of this file
    $nano conf/hdfs-site.xml
    replace 1 with 3 in between <property></property> tags:
    
  • Open conf/mapred-site.xml
    $nano conf/mapred-site.xml
    Add between <configuration></configuration> tags:
    <property>
    <name>mapred. job.tracker</name>
    <value>master:54311</value>
    </property> 

Step 7: Configuring Master Node Files

We can test our configuration by connecting Master node with a ssh session to slave node and to itself.


$cd /usr/local/Hadoop/etc/Hadoop
  • Open conf/masters, add name of master node to it.
  • Open conf/slaves, add name of all slave to this file.
  • Open conf/hdfs-site.xml, and remove property related to DataNode directory.

Step 8: Configuring slave Nodes Files

Open conf/hdfs-site.xml, remove property of NameNode directory Step 9: Format new Hadoop distributed file system. By restarting terminal as Hadoop user
$hdfs namenode –format
  

Step 9: Starting Hadoop cluster

$start-all.sh     
or 
$start-yarn.sh
$start-dfs.sh 

 Step 10: Checking running services in Hadoop cluster

$jps
Service  for Master node:

18221 ResourceManager
20582 Jps
17706 SecondaryNameNode
17085 NameNode

Services for Slave node:
4916 DataNode
5053 NodeManager