Wednesday, March 26, 2014

Common Error in Hadoop - Part 1

Common Error in Hadoop

Error:
10/01/18 10:52:48 INFO mapred.JobClient: Task Id : attempt_201001181020_0002_m_000014_0, Status : FAILED
  java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)


Reason:
1. Log Directory might be full, check for no of userlog Directories
2. Size of Log Directories

Solution:
1.Increase the ulimit of the log directory by adding
* hard nofile 10000 into  /etc/security/limits.conf
2.Clear some Space by deleting some directories

Error:
Reducer is not starting after map completion like map is 100% and hang after that in pseudo mode.

Reason:
problem with /etc/hosts file 

Solution:
1. Check for /etc/hosts and find if IP is given against Hostname,
if yes remove it and give the loopback address which is 127.0.0.1.

Error:
FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/hadoop/mydata/hdfs/
namenode is in an inconsistent state: storage directory does not exist or is not accessible.


Reason:
1.Hdfs Directory doesn't Exist or Dont have correct ownership or permissions.

Solution:
Create if not exist and correct the permission according to hdfs.

Error: 
Job initialization failed: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device at

Reason:
1.Space was full on log directory of Jobtracker

Solution:
Clear up some space from log directory

Error:  
Incompatible namespaceIDS in ...: namenode namespaceID = ..., datanode namespaceID = ...

Reason:
because the format namenode will re-create a new namespaceID, so that the original and datanode inconsistent.

Solution:
1. Data files deleted the datanode dfs.data.dir directory (default is tmp / dfs / data)
2. Modify dfs.data.dir / current / VERSION file the namespaceID and namenode identical to (log errors where there will be prompt)
3. To reassign new dfs.data.dir directory

Error:
Hadoop cluster is started with start-all.sh, slave always fail to start datanode, and will get an error:
Could only be replicated to 0 nodes, instead of 1 


Reason:
Is the node identification may be repeated (personally think the wrong reasons). There may also be other reasons, and what solution then tries to solve.

Solution:
1. If port access, you should make sure the port is open, such as hdfs :/ / machine1: 9000 / 50030,50070 like. Executive # iptables-I INPUT-p tcp-dport 9000-j ACCEPT command. If there is an error: hdfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused in; datanode port can not access, modify iptables: # iptables-I INPUT-s machine1-p tcp-j datanode on ACCEPT
2. There may be firewall restrictions between clusters to communicate with each other. Try to turn off the firewall. / Etc / init.d / iptables stop
3. Finally, there may be not enough disk space, check df -al

Error:
The program execution
Error: java.lang.NullPointerException


Reason:
Null pointer exception,  to ensure that the correct java program. Instantiated before the use of the variable what statement do not like array out of bounds. Inspection procedures.
When the implementation of the program, (various) error, make sure that the
situation:

Solution:
1. Premise of your program is correct by compiled
2. Cluster mode, the data to be processed wrote HDFS HDFS path and ensure correct
3. Specify the execution of jar package the entrance class name (I do not know why sometimes you do not specify also can run)
The correct wording similar to this:
$ hadoop jar myCount.jar myCount input output
4. Hadoop start datanode

Error:
Unrecognized option:-jvm Could not the create the Java virtual machine.

Reason:
Hadoop installation directory / bin / hadoop following piece of shell:

Solution:   
  CLASS = 'org.apache.hadoop.hdfs.server.datanode.DataNode'
   if [[$ EUID-eq 0]]; then
     HADOOP_OPTS = "$ HADOOP_OPTS-jvm server $ HADOOP_DATANODE_OPTS"
   else
     HADOOP_OPTS = "$ HADOOP_OPTS-server $ HADOOP_DATANODE_OPTS"
   fi
$ EUID user ID, if it is the root of this identification will be 0, so try not to use the root user to operate hadoop .

Error:
Terminal error message:
ERROR hdfs.DFSClient: Exception closing file / user / hadoop / musicdata.txt: java.io.IOException: All datanodes 10.210.70.82:50010 are bad. Aborting ...

There are the jobtracker logs the error information

Error register getProtocolVersion
java.lang.IllegalArgumentException: Duplicate metricsName: getProtocolVersion

And possible warning information:

WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Broken pipe
WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_3136320110992216802_1063java.io.IOException: Connection reset by peer
WARN hdfs.DFSClient: Error Recovery for block blk_3136320110992216802_1063 bad datanode [0] 10.210.70.82:50010 put: All datanodes 10.210.70.82:50010 are bad. Aborting ...


solution:
1. Path of under the dfs.data.dir properties of whether the disk is full, try hadoop fs -put data if the processing is full again.
2. Related disk is not full, you need to troubleshoot related disk has no bad sectors, need to be detected.

Error:
Hadoop jar program get the error message:
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.NullWritable, recieved org.apache.hadoop.io.LongWritable

Or something like this:

Status: FAILED java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

Solution:
Then you need to learn the basics of Hadoop and map reduce model. In "hadoop Definitive Guide book” in Chapter Hadoop I / O and in Chapter VII,  MapReduce type and format. If you are eager to solve this problem, I can also tell you a quick solution, but this is bound to affect you later development:
Ensure consistent data:

    ... Extends Mapper ...
    public void map (k1 k, v1 v, OutputCollector output) ...
    ...
    ... Extends Reducer ...
    public void reduce (k2 k, v2 v, OutputCollector output) ...
    ...
    job.setMapOutputKeyClass (k2.class);
    job.setMapOutputValueClass (k2.class);
    job.setOutputKeyClass (k3.class);
    job.setOutputValueClass (v3.class);
    ...

Note that the corresponding k * and v *. Recommendations or two chapters I just said. Know the details of its principles.

Error:
If you hit a datanode error as follows:
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Cannot lock storage / data1/hadoop_data. The directory is already locked.

Reason:
According to the error prompts view, it is the directory locked, unable to read. At this time you need to look at whether there are related process is still running or slave machine hadoop process is still running, use the linux command to view:

    Netstat -nap
    ps-aux | grep Related PID

Solution:
If hadoop related process is still running, use the kill command to kill can. And then re-use start-all.sh.

Error:
If you encounter the jobtracker error follows:
ERROR: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

Solution:
modify datanode node /etc/hosts file.
Hosts under brief format:
Each line is divided into three parts: the first part of the network IP address, the second part of the host name or domain name, the third part of the host alias detailed steps are as follows:

1.first check the host name:

$ echo –e “ `hostname - i ` \t `hostname -n` \t $stn ”

Stn= short name or alies of hostname.

It will result in something like that

10.200.187.77             hadoop-datanode          DN

If the IP address is configured on successfully modified, or show host name there is a problem, continue to modify the hosts file,
The shuffle error still appears this problem, then try to modify the configuration file of another user said hdfs-site.xml file, add the following:
dfs.http.address
*. *. *: 50070 The ports do not change, instead of the asterisk IP hadoop information transfer through HTTP, the port is same.

Error:
If you encounter the jobtracker error follows:
ERROR: java.lang.RuntimeException: PipeMapRed.waitOutputThreads (): subprocess failed with code *

Reason:
This is a java throws the system returns an error code, the meaning of the error code indicates details.