-
Notifications
You must be signed in to change notification settings - Fork 13
Hadoop configuration
amatteini edited this page Nov 7, 2012
·
2 revisions
Set ulimit to 8192 (Ubuntu Linux has a default limit of 1024 open files).
hdfs-site.xml
<!-- An Hadoop HDFS datanode has an upper bound on the number of files that it will serve at any one time. The upper bound parameter is called xcievers (yes, this is misspelled). Be sure to restart your HDFS after making the above configuration. Not having this configuration in place makes for strange looking failures. Eventually you'll see a complain in the datanode logs complaining about the xcievers exceeded, but on the run up to this one manifestation is complaint about missing blocks. --> <property> <name>dfs.datanode.max.xcievers</name> <value>4096</value> </property>
mapred-site.xml
<!-- The minimum size chunk that map input should be split into --> <property> <name>mapred.min.split.size</name> <value>268435456</value> <!-- 256 MB--> </property> <!-- Output compression --> <property> <name>mapred.output.compress</name> <value>true</value> </property> <property> <name>mapred.output.compression.type</name> <value>BLOCK</value> </property> <property> <name>mapred.output.compression.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> <!-- Reuse of a JVM across multiple tasks of the same job --> <property> <name>mapred.job.reuse.jvm.num.tasks</name> <value>-1</value> </property> <!-- Number of reduce tasks --> <property> <name>mapred.reduce.tasks</name> <value>6</value> </property> <!-- Heap-size for child jvms --> <property> <name>mapred.map.child.java.opts</name> <value>-Xmx1G</value> </property> <property> <name>mapred.reduce.child.java.opts</name> <value>-Xmx1G</value> </property> <!-- Number of maps/reduces spawned simultaneously on a TaskTracker. Default value is 2 --> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>4</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>2</value> </property>
The following configuration has been used for the 3690M triples test: