Big Data Board: Hadoop Cluster Disaster Recovery Solution 2/2

Sunday, September 14, 2014

Hadoop Cluster Disaster Recovery Solution 2/2

Hi Folks, In our last blog we have discussed about the synchronous data replication across the cluster which is pretty much expensive in term of network and performance. Today we will talk about the asynchronous data replication which less expensive then the previous one.

So lets start, how we can go with asynchronous data replication, what kind or design we required to setup that and how we will make this work.

In above picture we can see the designing of cross data replication, let focus how it works between two clusters.

When a client is writing a HDFS file, after the file is created, it starts to request a new block. And the primary cluster Active NameNode will allocate a new block and select a list of DataNodes for the client to write to. For the file which needs only asynchronous data replication, no remote DataNode from mirror cluster is selected for the pipeline at Active NameNode.
As usual, upon a successful block allocation, the client will write the block data to the first DataNode in the pipeline and also giving the remaining DataNodes.
As usual, the first DataNode will continue to write to the following DataNode in the pipeline until the last. But this time the pipeline doesn’t span to the mirror cluster.
Asynchronously, the mirror cluster Active NameNode will actively schedule to replicate data blocks which are not on any of the local DataNodes. As part of heartbeats it will send MIRROR_REPLICATION_REQUEST which will contain batch of blocks to replicate with target DataNodes selected from mirror cluster. The mirror cluster doesn’t need to aware of real block location in primary cluster.
As a result of handling the MIRROR_REPLICATION_REQUEST, the primary cluster Active NameNode takes care of selecting block location and schedules the replication command to corresponding source DataNode at primary cluster.
A DataNode will be selected to replicate the data block from one of the DataNodes in primary cluster that hold the block.
As a result of the replication pipeline, the local DataNode can replicate the block to other DataNodes of the mirror cluster.

Asynchronous Namespace Journaling

Synchronous journaling to remote clusters means more latency and performance impact. When the performance is critical, the admin can configure an asynchronous edit log journaling.

As usual, the primary cluster Active NameNode writes the edit logs to Shared Journal of the primary cluster.
As usual, the primary cluster Standby NameNode tails the edit logs from Shared Journal of the primary cluster.

The mirror cluster Active NameNode tails the edit logs from Shared Journal of the primary cluster. And applies the edit logs to its namespace in memory.
After applying the edit logs to its namespace, the mirror cluster Active NameNode also writes the edit logs to its local Shared Journal.
As usual, the mirror cluster Standby NameNode tails the edit logs from Shared Journal of the mirror cluster.

Points to remember

Better performance and low letency then the synchronous data replication.
Chance of data loss while asynchronous data replication and primary went down.
Required when performance is critical then the data.

8 comments:

UnknownSeptember 28, 2015 at 5:38 AM
Managing a business data is not an easy thing, it is very complex process to handle the corporate information both Hadoop and cognos doing this in a easy manner with help of business software suite, thanks for sharing this useful post….
Regards,
cognos Training in Chennai|cognos Training|cognos tm1 Training in Chennai
ReplyDelete
Replies
UnknownSeptember 29, 2015 at 5:33 AM
A table is the basic unit of data storage in an oracle database. The table of a database hold all of the user accesible data. Table data is stored in rows and columns. But what is all about the clusters and how to handle it using oracle database system? Expecting a right answer from you. By the way you are maintaining a great blog. Thanks for sharing this in here.
Oracle Training in Chennai | Oracle Course in Chennai | Oracle Training Center in Chennai
ReplyDelete
Replies
UnknownDecember 25, 2015 at 2:30 AM
Maharashtra Police Patil Recruitment 2016

Great collection of information., Very Impressive Article.......
ReplyDelete
Replies
UnknownJuly 28, 2016 at 4:52 AM
The strategy you posted was nice. The people who want to shift their career to the IT sector then it is the right option to go with the ethical hacking course.
Ethical hacking course in Chennai | Ethical hacking training in chennai
ReplyDelete
Replies
kevingeorgeApril 22, 2018 at 11:27 PM
I wish to show thanks to you just for bailing me out of this particular trouble.As a result of checking through the net and meeting techniques that were not productive, I thought my life was done.

Java Training Institute Bangalore

Best Java Training Institute Chennai

ReplyDelete
Replies
AnonymousMay 1, 2018 at 10:53 PM
And indeed, I’m just always astounded concerning the remarkable things served by you. Some four facts on this page are undeniably the most effective I’ve had.
google-cloud-platform-training-in-chennai
ReplyDelete
Replies
UnknownMay 5, 2018 at 3:32 AM
I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.

Best Hadoop Training in Chennai
ReplyDelete
Replies
sunshineprofeOctober 26, 2018 at 2:44 AM
Do you have a spam issue on this website; I also am a blogger, and I wanted to know your situation; many of us have developed some nice methods
safety courses in chennai
ReplyDelete
Replies

Add comment