Big Data Board: Sqoop

Showing posts with label Sqoop. Show all posts

Sunday, March 23, 2014

Hadoop Installation (type RPM )

Hi Folks,

Today we are going for RPM installation of hadoop. It is also pretty easy as my last hadoop installtion was , So lets try it out.

Requirement

Java JDK (download from here)
hadoop-0.20.204.0-1.i386.rpm (Download from here)

Installation

1. Installation of Java and set Java Home on /etc/profile by export JAVA_HOME=/usr

sudo ./jdk-6u26-linux-x64-rpm.bin.sh

2. Hadoop RPM installation

sudo rpm -i hadoop-0.20.204.0-1.i386.rpm

3. Setting up Single Node cluster

sudo /usr/sbin/hadoop-setup-single-node.sh

You will get many question while setting we up hadoop , like creation of directories and some configuration related, you need to give answer in y.

For MultiNode Setup You Need to run below commands

3- Setting up Multinode Cluster

sudo /usr/sbin/hadoop-setup-conf.sh \
--namenode-host=hdfs://${namenode}:9000/ \
--jobtracker-host=${jobtracker}:9001 \
--conf-dir=/etc/hadoop \
--hdfs-dir=/var/lib/hadoop/hdfs \
--namenode-dir=/var/lib/hadoop/hdfs/namenode \
--mapred-dir=/var/lib/hadoop/mapred \
--mapreduce-user=mapred \
--datanode-dir=/var/lib/hadoop/hdfs/data \
--log-dir=/var/log/hadoop \
--auto

Where $namenode and $jobtracker are the Hostname of respective Nodes where you want to run the services, you have to fire this command on everyNode.

4. Now after installation you have to format the namenode

sudo /usr/sbin/hadoop-setup-hdfs.sh

5. For Starting services you can do as below

For single Node

for service in /etc/init.d/hadoop-* ;do sudo $service start ; done

For Multinode

on Master Node

sudo /etc/init.d/hadoop-namenode start
sudo /etc/init.d/hadoop-jobtracker start
sudo /etc/init.d/hadoop-secondarynamenode start

on Slave Node

sudo /etc/init.d/hadoop-datanode start
sudo /etc/init.d/hadoop-tasktracker start

6. You can Create a User Account for you self on HDFS by below command

sudo /usr/sbin/hadoop-create-user.sh -u $USER

Now You can run the word count program as given in previous post. Please try it out and let me know if faced any issue in this.

Thanks

Tuesday, March 18, 2014

Hadoop Installations (Tarball)

Hi Folks,

We have seen hadoop installation via many type like rpm, Automatic, tarball, Yum etc. Now in this blog we will do all the types of installation one by one.

Lets try with Tarball Installation.

Requirement

We only require Java installed on the node
JAVA_HOME should be Set.
Check for Iptables(should be off)
SElinux should be disable
Ports should be open (9000; 9001; 50010; 50020; 50030; 50060; 50070; 50075; 50090)

Installation

Download the tarball from the Apache official website

wget http://archive.apache.org/dist/hadoop/core/hadoop-1.0.4/hadoop-1.0.4.tar.gz

Untar the installation

tar -xzvf hadoop-1.0.4..tar.gz

Setting up the variables in .profile of the user

export JAVA HOME=PATH TO JDK INSTALLATION

export HADOOP HOME=/home/hadoop/project/hadoop-1.0.4

export PATH=$JAVA HOME/bin:$HADOOP HOME/bin:$PATH

update JAVA_HOME inside the hadoop-env.sh from $HADOOP_HOME/conf/hadoop-env.sh

Configuration

Editing the following files to set the different parameters for each other, these are the minimal configuration for these files.

$HADOOP_HOME/conf/core-default.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>

$HADOOP_HOME/conf/hdfs-default.xml,

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

$HADOOP_ HOME/conf/mapred-default.xml,

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>

update the slave file inside the $HADOOP_HOME/conf/slaves, make all the slave entry inside this file.

We have to do this for all the nodes to set up the hadoop cluster. After doing it for all the nodes now we can start the service after formatting the name node.

Suppose we have master as main node which will act as hadoop name node. so below are the steps we will perform in that node.

$HADOOP_HOME/bin/hadoop namenode -format

This is will format the hdfs and now we are ready to run services on all the nodes.

For Master node

$HADOOP_HOME/sbin/hadoop name node start namenode

$HADOOP_HOME/sbin/hadoop-daemon.sh start jobtracker

$HADOOP_HOME/sbin/hadoop-daemon.sh start secondaryNamenode

For SLAVE NODES

$HADOOP_HOME/sbin/hadoop name node start datanode

$HADOOP_HOME/sbin/hadoop name node start task tracker

Now we can check the service on below URL's

Namenode:- http://master:50070/

Jobtracker:- http://master:50030/

Above are the simplest and easiest tar ball installation of hadoop. please comment if you have any issue while installation.

Sunday, March 2, 2014

Hadoop Resources - Books

Hello Guys,

I have been thinking how i can share the hadoop stuff like books, white papers and pdfs. Few days back i was looking for some hadoop book online and i was not able to find it , i have invested 2-3 hours to find that book.

After wasting my time i thought why not i put all the things which i have so that other can easily get it from here. So here i m listing the books which you could get it easily.