Big Data Board: hadoop commands

Hi Folks,

Today i am going to show you some of important commands which you can use for different purposes.

1. Data read and written by the particular process by providing pid of process

cat /proc/$pid/io | grep -wE "read_bytes|write_bytes" | awk -F':' '{print $1 " " $2/(1024*1024) " Mb"}'

2. Delete N nos of file

find . -name "*.gc" -print0 | xargs -0 rm

3. Generate random data for use cases Ex like 5 *10 MB files

dd if=/dev/urandom of=a.log bs=5M count=10

4. Replace spaces from file name

IFS=$'\n';for f in `find .`; do file=$(echo $f | tr [:blank:] '_'); [ -e $f ] && [ ! -e $file ] && mv "$f" $file; done;unset IFS;

5. Difference between fileA and fileB

awk 'BEGIN { while ( getline < "fileB" ) { arr[$0]++ } } { if (!( $0 in arr ) ) { print } }' fileA

6. Print the hostnames of datanodes by commandline (used when you have large no of nodes)

for a in `hadoop dfsadmin -report | grep -i name | awk -F ':' '{print $2}'`; do host $a| awk '{print $5}' | sed 's/.$//g'; done

7. Dfs % used of hadoop nodes

hadoop dfsadmin -report | grep -A6 Name | tr '\n' ' ' | tr '-' '\n' | awk '{print substr($2,0,13)" "$29}'

8. Read XML (format:- [hdfs|core|mapred]-site.xml) file from A to B

cat $fil | sed -n “/A/,/B/p"

9. Change XML (format:- [hdfs|core|mapred]-site.xml) to Yaml

cat hdfs-site.xml | grep -e "<name>" -e "<value>" | sed 's/<name>//g;s/<value>//g;s/<\/value>//g;s/<\/name>/:/g' | perl -p -e 's/:\n/:/'

10. Get the value of particular parameter of xml file (format:- [hdfs|core|mapred]-site.xml)

awk -F"[<>]" '/mapred.local.dir/ {getline;print $3;exit}'

Hope these are helpful to you :)

Big Data Board

Tuesday, April 22, 2014

Linux & Hadoop Uniq Commands