Showing posts with label Hadoop Resource. Show all posts
Showing posts with label Hadoop Resource. Show all posts

Sunday, March 23, 2014

Hadoop Installation (type RPM )

Hi Folks,

Today we are going for RPM installation of hadoop. It is also pretty easy as my last hadoop installtion was ,  So lets try it out.

Requirement
  • Java JDK (download from here)
  • hadoop-0.20.204.0-1.i386.rpm  (Download from here)
Installation

1. Installation of Java and set Java Home on /etc/profile by export JAVA_HOME=/usr
sudo ./jdk-6u26-linux-x64-rpm.bin.sh
2. Hadoop RPM installation
sudo rpm -i hadoop-0.20.204.0-1.i386.rpm
3. Setting up Single Node cluster
sudo /usr/sbin/hadoop-setup-single-node.sh 
You will get many question while setting we up hadoop , like creation of directories and some configuration related, you need to give answer in y.

 For MultiNode Setup You Need to run below commands

3- Setting up Multinode Cluster
sudo /usr/sbin/hadoop-setup-conf.sh \
  --namenode-host=hdfs://${namenode}:9000/ \
  --jobtracker-host=${jobtracker}:9001 \
  --conf-dir=/etc/hadoop \
  --hdfs-dir=/var/lib/hadoop/hdfs \
  --namenode-dir=/var/lib/hadoop/hdfs/namenode \
  --mapred-dir=/var/lib/hadoop/mapred \
 --mapreduce-user=mapred \
  --datanode-dir=/var/lib/hadoop/hdfs/data \
  --log-dir=/var/log/hadoop \
  --auto
 Where $namenode and $jobtracker are the Hostname of respective Nodes where you want to run the services, you have to fire this command on everyNode.

4. Now after installation you have to format the namenode
sudo /usr/sbin/hadoop-setup-hdfs.sh
5.  For Starting services you can do as below
  • For single Node
for service in /etc/init.d/hadoop-* ;do sudo  $service  start ; done
  •  For Multinode
    • on Master Node
    sudo  /etc/init.d/hadoop-namenode start
    sudo  /etc/init.d/hadoop-jobtracker start 
    sudo  /etc/init.d/hadoop-secondarynamenode start 
    • on Slave Node
sudo  /etc/init.d/hadoop-datanode start
sudo  /etc/init.d/hadoop-tasktracker start 
6. You can Create a User Account for you self on HDFS by below command
sudo /usr/sbin/hadoop-create-user.sh -u $USER

Now You can run the word count program as given in previous post. Please try it out and let me know if faced any issue in this.

Thanks

Tuesday, March 18, 2014

Hadoop Cluster Designing

Hi Folks ,

I remember when i was trying to design my first cluster with several nodes, i dont have much idea about , what things we need to take care, what would be the disk size, ram size like there were many
questions in my mind.

I tried to find the basic configuration , specific configurations to IO tensive, memory intensive cluster. i have read many blogs , books to get an idea about the cluster designing, kind of loads on clusters. After searching a lot i came across few assumption of cluster designing.

Today i would like to provide you some Assumption have found and created for cluster designing.

Things to Remembers
  •  Cluster Sizing and Hardware 
    • Large no of nodes instead of large no of disk on nodes
    • Multiple racks give multiple failure domains
    • Good Commodity hardwares
    • Always have pilot cluster before implement in some production
    • Always look for the load type like memory or cpu intensive 
    • Start from basic requirements like 2-4Tb(1U 6 disks or 2U 12 disks)
  • Networking
    • Always have proper networking between Nodes
    • 1GbE  between the nodes in the Rack
    • 10GbE between the Racks in the cluster
    • Keep isolated from different cluster for security.
  • Monitoring
    • Always have something for monitoring like ganglia for different matrixes
    • Use Alerting system keeping yourself update while any mis-happening using Nagios
    • We can also use Ambari and Cloudera manager from different Venders.  


Hope you got some idea about the hadoop cluster designing. We we move forward about type of hadoop installation.

  • Standalone Installation
    • one node cluster running everything on one machine.
    • No daemon process is running.
  • Pseudo Installation
    • one node cluster running everything on one machine
    • NN,DT,JT,TT all running on different JVM's 
    • There is only slight difference in pseudo and Standalone installation.
  • Distributed Installation
    • As its says a cluster with multiple nodes.
    • Every daemon process running on different nodes like DN & TT running on slaves Nodes, while NN & JT running on same or may be different Nodes.
    • We generally used this cluster for POC kind of stuff.

Sunday, March 2, 2014

Hadoop Resources - Books

Hello Guys,

I have been thinking how i can share the hadoop stuff like books, white papers and pdfs. Few days back i was looking for some hadoop book online and i was not able to find it , i have invested 2-3 hours to find that book.

After wasting my time i thought why not i put all the things which i have so that other can easily get it from here. So here i m listing the books which you could get it easily.