Hi Folks ,
I remember when i was trying to design my first cluster with several nodes, i dont have much idea about , what things we need to take care, what would be the disk size, ram size like there were many
questions in my mind.
I tried to find the basic configuration , specific configurations to IO tensive, memory intensive cluster. i have read many blogs , books to get an idea about the cluster designing, kind of loads on clusters. After searching a lot i came across few assumption of cluster designing.
Today i would like to provide you some Assumption have found and created for cluster designing.
Things to Remembers
Hope you got some idea about the hadoop cluster designing. We we move forward about type of hadoop installation.
I remember when i was trying to design my first cluster with several nodes, i dont have much idea about , what things we need to take care, what would be the disk size, ram size like there were many
questions in my mind.
I tried to find the basic configuration , specific configurations to IO tensive, memory intensive cluster. i have read many blogs , books to get an idea about the cluster designing, kind of loads on clusters. After searching a lot i came across few assumption of cluster designing.
Today i would like to provide you some Assumption have found and created for cluster designing.
Things to Remembers
- Cluster Sizing and Hardware
- Large no of nodes instead of large no of disk on nodes
- Multiple racks give multiple failure domains
- Good Commodity hardwares
- Always have pilot cluster before implement in some production
- Always look for the load type like memory or cpu intensive
- Start from basic requirements like 2-4Tb(1U 6 disks or 2U 12 disks)
- Networking
- Always have proper networking between Nodes
- 1GbE between the nodes in the Rack
- 10GbE between the Racks in the cluster
- Keep isolated from different cluster for security.
- Monitoring
- Always have something for monitoring like ganglia for different matrixes
- Use Alerting system keeping yourself update while any mis-happening using Nagios
- We can also use Ambari and Cloudera manager from different Venders.
Hope you got some idea about the hadoop cluster designing. We we move forward about type of hadoop installation.
- Standalone Installation
- one node cluster running everything on one machine.
- No daemon process is running.
- Pseudo Installation
- one node cluster running everything on one machine
- NN,DT,JT,TT all running on different JVM's
- There is only slight difference in pseudo and Standalone installation.
- Distributed Installation
- As its says a cluster with multiple nodes.
- Every daemon process running on different nodes like DN & TT running on slaves Nodes, while NN & JT running on same or may be different Nodes.
- We generally used this cluster for POC kind of stuff.