Big Data Board: Linux

Showing posts with label Linux. Show all posts

Sunday, September 14, 2014

Single disk issue in Hadoop Cluster.

Hi Folks, Recently i have performed simple test on hadoop cluster. we have a pretty large cluster with so much of data, each datanode is of around 24 TB hdd(12, 2tb disks). let me tell what is the general issue we faced and how did we resolved it.

ISSUE:- 1 or 2 Disk is getting full like 90% and other disks of the same node are between 50-60%, due to which we were getting continuos alerts. Its was getting pain for us becoz it was happening frequently becoz of large size of the cluster.

Resolution:- We have tried some test and finally got successful to cope up with this situation, let me tell what we have performed and how did we resolved it.

1. I have created a text file of 142mb with 10000000 records and copied it into hdfs.

[hdfs@ricks-01 13:21:01 ~]$ hadoop fs -cat /user/hdfs/file | wc -l

10000000

2. Set its replication factor to 1 so that only one replication would be present on the cluster.

[hdfs@ricks-01 12:13:07 ~]$ hadoop fs -setrep 1 /user/hdfs/file

Replication 1 set: /user/hdfs/file

3. Now run fsck to check the location of its block on the datanode.

[hdfs@ricks-01 12:14:32 ~]$ hadoop fsck /user/hdfs/file -files -blocks -locations

/user/hdfs/file 148888897 bytes, 2 block(s):  OK

0. BP-89257919-1406754396842:blk_1073745304_4480 len=134217728 repl=1 [17.170.204.86:1004]

1. BP-89257919-1406754396842:blk_1073745305_4481 len=14671169 repl=1 [17.170.204.86:1004]

4. Check the host where it is present and found that its present on ricks-04 node

[hdfs@ricks-01 12:14:38 ~]$ host 17.170.204.86

86.204.170.17.in-addr.arpa domain name pointer ricks-04.

5. now i have to login into that node and find the block and mv it to another disks at the same path.

[hdfs@ricks-04 12:15:55 ~]$ find /ngs*/app/hdfs/hadoop/ -name blk_1073745305

/disk2/hdfs/dfs/dn/current/BP-89257919-1406754396842/current/finalized/blk_1073745305

[hdfs@ricks-04 12:16:03 ~]$ mv /disk2/hdfs/dfs/dn/current/BP-89257919-1406754396842/current/finalized/blk_1073745305*  /disk5/hdfs/dfs/dn/current/BP-89257919-1406754396842/current/finalized/

6. Now again search the block so that its confirmed that there is no other replication is present and its copied to new location which is disk5.

[hdfs@ricks-04 12:17:48 ~]$ find /disk*/hdfs/ -name blk_1073745305*

/disk5/hdfs/dfs/dn/current/BP-89257919-1406754396842/current/finalized/blk_1073745305

/disk5/hdfs/dfs/dn/current/BP-89257919-1406754396842/current/finalized/blk_1073745305_4481.meta

7. Now run the fsck so that it register its new location to the namenode.

[hdfs@ricks-01 13:02:21 ~]$ hadoop fsck /user/hdfs/file -files -blocks -locations

/user/hdfs/file 148888897 bytes, 2 block(s):  OK

0. BP-89257919-1406754396842:blk_1073745304_4480 len=134217728 repl=1 [17.170.204.86:1004]

1. BP-89257919-1406754396842:blk_1073745305_4481 len=14671169 repl=1 [17.170.204.86:1004]

8. Now again run the hdfs command and check the count of the file if its giving the right count that its get register with the namenode if it not giving the right count then wait for sometime and try again.

[hdfs@ricks-01 13:21:01 ~]$ hadoop fs -cat /user/hdfs/file | wc -l

10000000

Point to remember

Be Extra careful while moving the block from one place to another, you might want to take backup before moving it.
sure that there is not jobs running on it at that point of time, you can kill the TT before doing that.
You can restart your datanode service, after performing this test.

Hadoop Cluster Disaster Recovery Solution 2/2

Hi Folks, In our last blog we have discussed about the synchronous data replication across the cluster which is pretty much expensive in term of network and performance. Today we will talk about the asynchronous data replication which less expensive then the previous one.

So lets start, how we can go with asynchronous data replication, what kind or design we required to setup that and how we will make this work.

In above picture we can see the designing of cross data replication, let focus how it works between two clusters.

When a client is writing a HDFS file, after the file is created, it starts to request a new block. And the primary cluster Active NameNode will allocate a new block and select a list of DataNodes for the client to write to. For the file which needs only asynchronous data replication, no remote DataNode from mirror cluster is selected for the pipeline at Active NameNode.
As usual, upon a successful block allocation, the client will write the block data to the first DataNode in the pipeline and also giving the remaining DataNodes.
As usual, the first DataNode will continue to write to the following DataNode in the pipeline until the last. But this time the pipeline doesn’t span to the mirror cluster.
Asynchronously, the mirror cluster Active NameNode will actively schedule to replicate data blocks which are not on any of the local DataNodes. As part of heartbeats it will send MIRROR_REPLICATION_REQUEST which will contain batch of blocks to replicate with target DataNodes selected from mirror cluster. The mirror cluster doesn’t need to aware of real block location in primary cluster.
As a result of handling the MIRROR_REPLICATION_REQUEST, the primary cluster Active NameNode takes care of selecting block location and schedules the replication command to corresponding source DataNode at primary cluster.
A DataNode will be selected to replicate the data block from one of the DataNodes in primary cluster that hold the block.
As a result of the replication pipeline, the local DataNode can replicate the block to other DataNodes of the mirror cluster.

Asynchronous Namespace Journaling

Synchronous journaling to remote clusters means more latency and performance impact. When the performance is critical, the admin can configure an asynchronous edit log journaling.

As usual, the primary cluster Active NameNode writes the edit logs to Shared Journal of the primary cluster.
As usual, the primary cluster Standby NameNode tails the edit logs from Shared Journal of the primary cluster.

The mirror cluster Active NameNode tails the edit logs from Shared Journal of the primary cluster. And applies the edit logs to its namespace in memory.
After applying the edit logs to its namespace, the mirror cluster Active NameNode also writes the edit logs to its local Shared Journal.
As usual, the mirror cluster Standby NameNode tails the edit logs from Shared Journal of the mirror cluster.

Points to remember

Better performance and low letency then the synchronous data replication.
Chance of data loss while asynchronous data replication and primary went down.
Required when performance is critical then the data.

Hadoop Cluster Disaster Recovery Solution 1/2

Hi Folks, whenever we think about the cluster setup and design. We always think about DR how can we save our Data from cluster crash. Today we discuss about the disaster recovery plan for hadoop cluster, what would be steps we can take and how far we can save our data.

Type of Cluster design across data center

1. Synchronous Data Replication between cluster
2. ASynchronous Data Replication between cluster

lets talk about the Synchronous Data writing between cluster. Here is the pictorial view of data center design.

When a client is writing a HDFS file, after the file is created, it starts to request a new block. And the Active NameNode of primary cluster will allocate a new block and select a list of DataNodes for the client to write to. By using the new mirror block placement policy, the Active NameNode can guarantee one or more remote DataNodes from the mirror cluster are selected at the end of the pipeline.
The primary cluster Active NameNode knows the available DataNodes of the mirror cluster via heartbeats from mirror cluster’s Active NameNode with the MIRROR_DATANODE_AVAILABLE command. So, latest reported DataNodes will be considered for the mirror cluster pipeline which will be appended to primary cluster pipeline.
As usual, upon a successful block allocation, the client will write the block data to the first DataNode in the pipeline and also giving the remaining DataNodes.
As usual, the first DataNode will continue to write to the following DataNode in the pipeline.
The last local DataNode in the pipeline will continue to write the remote DataNode that following.
If there are more than one remote DataNodes are selected, the remote DataNode will continue to write to the following DataNode which is local to the remote DataNode. We provide flexibility to users that they can even configure the mirror cluster replication. Based on the configured replication, mirror nodes will be selected.

Synchronous Namespace Journaling

As usual, the primary cluster Active NameNode writes the edit logs to Shared Journal of the primary cluster.
The primary cluster Active NameNode also writes the edit logs to the mirror cluster Active NameNode by using a new JournalManager.
As usual, the primary cluster Standby NameNode tails the edit logs from Shared Journal of the primary cluster.
The mirror cluster Active NameNode writes the edit logs to Shared Journal of the mirror cluster after applying the edit logs received from the primary cluster.
As usual, the mirror cluster Standby NameNode tails the edit logs from Shared Journal of the mirror cluster.

Points to Remember

Synchronous Data writing is good when the data is very critical and we cant afford to lose consistency at any point of time.
It Actually increase the latency of hadoop data writing, which impact performance of the hadoop cluster.
Required more network bandwidth and stability to cope with synchronous replication.

Tuesday, April 22, 2014

Linux & Hadoop Uniq Commands

Hi Folks,

Today i am going to show you some of important commands which you can use for different purposes.

1. Data read and written by the particular process by providing pid of process

cat /proc/$pid/io | grep -wE "read_bytes|write_bytes" | awk -F':' '{print $1 " " $2/(1024*1024) " Mb"}'

2. Delete N nos of file

find . -name "*.gc" -print0 | xargs -0 rm

3. Generate random data for use cases Ex like 5 *10 MB files

dd if=/dev/urandom of=a.log bs=5M count=10

4. Replace spaces from file name

IFS=$'\n';for f in `find .`; do file=$(echo $f | tr [:blank:] '_'); [ -e $f ] && [ ! -e $file ] && mv "$f" $file; done;unset IFS;

5. Difference between fileA and fileB

awk 'BEGIN { while ( getline < "fileB" ) { arr[$0]++ } } { if (!( $0 in arr ) ) { print } }' fileA

6. Print the hostnames of datanodes by commandline (used when you have large no of nodes)

for a in `hadoop dfsadmin -report | grep -i name | awk -F ':' '{print $2}'`; do host $a| awk '{print $5}' | sed 's/.$//g'; done

7. Dfs % used of hadoop nodes

hadoop dfsadmin -report | grep -A6 Name | tr '\n' ' ' | tr '-' '\n' | awk '{print substr($2,0,13)" "$29}'

8. Read XML (format:- [hdfs|core|mapred]-site.xml) file from A to B

cat $fil | sed -n “/A/,/B/p"

9. Change XML (format:- [hdfs|core|mapred]-site.xml) to Yaml

cat hdfs-site.xml | grep -e "<name>" -e "<value>" | sed 's/<name>//g;s/<value>//g;s/<\/value>//g;s/<\/name>/:/g' | perl -p -e 's/:\n/:/'

10. Get the value of particular parameter of xml file (format:- [hdfs|core|mapred]-site.xml)

awk -F"[<>]" '/mapred.local.dir/ {getline;print $3;exit}'

Hope these are helpful to you :)