Multi node Installation involves one master and many slaves. For building this setup we start by initially installing a single node cluster in each node test them and then merging them with required settings to make one node as master and other node as slaves. It is much easier to track down any problems one might encounter due to reduced complexity of a single node cluster setup on each machine.
Download and configure single node cluster in each node on this cluster using http://rachana706.blogspot.in/2015/05/first-step-with-apache-hadoop.html
Step 1: Configuring Single Node Cluster for each node
Step 2: Networking
It is easier to put all the machine on same network with regards to hardware and software configuration, Here we take three machine for exampleconnect the both machine via a single hub or switch and configure the networkinterfaces to use a common network such as 192.168.0.x/24. We need to update the IP address of each node (master node and slaves) in file /etc/hosts.$nano /etc/hosts
Add the following lines to this file:
192.168.1.100 master
192.168.1.101 slave1
192.168.1.102 slave2
Step 3: SSH Access
Master node must be able to connect to itself and to all slaves in order to start-services and doing cluster management tasks. For this we need to add masters ssh key to authorized_ key file of all the slaves.hduser@master:$ssh-copy-id –i HOME/.ssh/id_rsa.pub hduser@slave1
hduser@master:$ssh-copy-id –i HOME/.ssh/id_rsa.pub hduser@slave2
Step 4: Test SSH connection
We can test our configuration by connecting Master node with a ssh session to slave node and to itself.hduser@master:$ssh master
hduser@master:$ssh slave
Step 6: Configuring Files For All Nodes
We can test our configuration by connecting Master node with a ssh session to slave node and to itself.- Open conf/core-site.xml, and modify file.
$nano conf/core-site.xml
Add between <configuration></configuration> tags:
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
- Open conf/yarn-site.xml and Add following properties to this file.
$nano conf/yarn-site.xml
Add between <configuration></configuration> tags:
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>Hadoopmaster:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>Hadoopmaster:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>Hadoopmaster:8050</value>
</property>
- Open conf/hdfs-site.xml and replace 1 with 3 in replication property of this file
$nano conf/hdfs-site.xml replace 1 with 3 inbetween <property></property> tags:
- Open conf/mapred-site.xml
$nano conf/mapred-site.xml
Add between <configuration></configuration> tags:
<property>
<name>mapred. job.tracker</name>
<value>master:54311</value>
</property>
Step 7: Configuring Master Node Files
We can test our configuration by connecting Master node with a ssh session to slave node and to itself.$cd /usr/local/Hadoop/etc/Hadoop
- Open conf/masters, add name of master node to it.
- Open conf/slaves, add name of all slave to this file.
- Open conf/hdfs-site.xml, and remove property related to DataNode directory.
Step 8: Configuring slave Nodes Files
Open conf/hdfs-site.xml, remove property of NameNode directory Step 9: Format new Hadoop distributed file system. By restarting terminal as Hadoop user$hdfs namenode –format
Step 9: Starting Hadoop cluster
$start-all.sh
or
$start-yarn.sh
$start-dfs.sh
Step 10: Checking running services in Hadoop cluster
$jps
Service for Master node:
18221 ResourceManager
20582 Jps
17706 SecondaryNameNode
17085 NameNode
Services for Slave node:
4916 DataNode
5053 NodeManager