-
Notifications
You must be signed in to change notification settings - Fork 13
Hadoop troubleshooting
Andrea edited this page May 2, 2013
·
4 revisions
Following are some issues that we have run into while running Hadoop, together with some possible solutions.
Symptom | Possible Solutions |
---|---|
Tasks failing with the following error:
|
a) Increase the file descriptor limits, by setting ulimit to 8192; b) Increase the upper bound on the number of files that each datanode will serve at any one time, by setting xceivers to 4096. |
Get the following error while starting a datanode (check [hadoop_home]/logs/hadoop-hduser-datanode-xxx.log ):
|
Format and restart the cluster |
Tasks failing with the following error:
|
That could happen in a number of situations, and it’s a bit tricky to debug. Usually it means that a machine wasn’t able to fetch a block from HDFS. Cleanup the etc/hosts could help:- use hostnames instead of ips - sync it across all the nodes - try commenting out “127.0.0.1 localhost” Restart the cluster after making these changes. |
Get the following error when putting data into the dfs:
|
The NameNode does not have any available DataNodes. This can be caused by a wide variety of reasons. Solution: Erase all temporary data along with the namenode, reformat the namenode, start everything up, and visit the dfs health page (http://master:50070/dfshealth.jsp). |
Tasks failing with the following error:
|
Possible reason: the memory allocated for the tasks trackers (sum of mapred.*.child.java.opt in mapred-site.xml) is more than the nodes actual memory |
Tasks fail during merge operations with an OutOfMemory Exception. |
Reduce mapred.job.shuffle.input.buffer.percent in core-site.xml to a value < 0.7, try 0.5 for example. |
Sorting is too slow. | Increase io.sort.mb and io.sort.factor to increase the buffer used for sorting in-memory and the number of files to merge at once. Possible values: 200 respectively 50. |
Job failing with the following error:
|
The job is hitting the default set level of split sizes (10000000L). Set the mapreduce.job.split.metainfo.maxsize property in your jobtracker’s mapred-site.xml config file to an higher value. |
Hadoop depends on slf4j-api-1.4.3. Since any java.util.logging (jul) handler is provided for SLF4J 1.4.3 (only from 1.5.2), all incoming jul messages (eg. from ldspider, silk) are not redirected to the SLF4J API, but just printed to the standard output. See #79 for more details.
Below is a console output example:
[INFO] One time execution enabled
[INFO] Import Job freebase.3 started (crawl / daily)
[INFO] Crawling seed: http://rdf.freebase.com/ns/m/0fpjn6x (with levels=2, limit=100000)
Jan 10, 2012 12:25:13 PM com.ontologycentral.ldspider.hooks.links.LinkFilterSelect <init>
INFO: link predicate is [http://rdf.freebase.com/ns/music.artist.genre]
Jan 10, 2012 12:25:13 PM com.ontologycentral.ldspider.Crawler evaluateBreadthFirst
INFO: freebase.com: 1
Jan 10, 2012 12:25:13 PM com.ontologycentral.ldspider.Crawler evaluateBreadthFirst
INFO: Starting threads round 0 with 1 uris
Jan 10, 2012 12:25:14 PM com.ontologycentral.ldspider.http.LookupThread run
INFO: lookup on http://rdf.freebase.com/ns/m/0fpjn6x status 303 LT-0:http://rdf.freebase.com/ns/m/0fpjn6x
Jan 10, 2012 12:25:14 PM com.ontologycentral.ldspider.http.LookupThread run
INFO: lookup on http://rdf.freebase.com/rdf/m/0fpjn6x status 200 LT-0:http://rdf.freebase.com/rdf/m/0fpjn6x
-
HDFS filesystem checking utility
-
HDFS supports the fsck command to check for various inconsistencies. It it is designed for reporting problems with various files, for example, missing blocks for a file or under-replicated blocks. Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures. By default fsck ignores open files but provides an option to select all files during reporting. The HDFS fsck command is not a Hadoop shell command. It can be run as
hadoop fsck
. For command usage, see fsck. fsck can be run on the whole file system or on a subset of files.
-
HDFS supports the fsck command to check for various inconsistencies. It it is designed for reporting problems with various files, for example, missing blocks for a file or under-replicated blocks. Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures. By default fsck ignores open files but provides an option to select all files during reporting. The HDFS fsck command is not a Hadoop shell command. It can be run as
- Another Hadoop troubleshooting page