Tuesday, March 18, 2014

Hadoop Cluster Designing

Hi Folks ,

I remember when i was trying to design my first cluster with several nodes, i dont have much idea about , what things we need to take care, what would be the disk size, ram size like there were many
questions in my mind.

I tried to find the basic configuration , specific configurations to IO tensive, memory intensive cluster. i have read many blogs , books to get an idea about the cluster designing, kind of loads on clusters. After searching a lot i came across few assumption of cluster designing.

Today i would like to provide you some Assumption have found and created for cluster designing.

Things to Remembers
  •  Cluster Sizing and Hardware 
    • Large no of nodes instead of large no of disk on nodes
    • Multiple racks give multiple failure domains
    • Good Commodity hardwares
    • Always have pilot cluster before implement in some production
    • Always look for the load type like memory or cpu intensive 
    • Start from basic requirements like 2-4Tb(1U 6 disks or 2U 12 disks)
  • Networking
    • Always have proper networking between Nodes
    • 1GbE  between the nodes in the Rack
    • 10GbE between the Racks in the cluster
    • Keep isolated from different cluster for security.
  • Monitoring
    • Always have something for monitoring like ganglia for different matrixes
    • Use Alerting system keeping yourself update while any mis-happening using Nagios
    • We can also use Ambari and Cloudera manager from different Venders.  


Hope you got some idea about the hadoop cluster designing. We we move forward about type of hadoop installation.

  • Standalone Installation
    • one node cluster running everything on one machine.
    • No daemon process is running.
  • Pseudo Installation
    • one node cluster running everything on one machine
    • NN,DT,JT,TT all running on different JVM's 
    • There is only slight difference in pseudo and Standalone installation.
  • Distributed Installation
    • As its says a cluster with multiple nodes.
    • Every daemon process running on different nodes like DN & TT running on slaves Nodes, while NN & JT running on same or may be different Nodes.
    • We generally used this cluster for POC kind of stuff.

No comments:

Post a Comment