out of Memory Error in Hadoop

JavaHadoop

Java Problem Overview


I tried installing Hadoop following this http://hadoop.apache.org/common/docs/stable/single_node_setup.html document. When I tried executing this

bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' 

I am getting the following Exception

java.lang.OutOfMemoryError: Java heap space

Please suggest a solution so that i can try out the example. The entire Exception is listed below. I am new to Hadoop I might have done something dumb . Any suggestion will be highly appreciated.

anuj@anuj-VPCEA13EN:~/hadoop$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
11/12/11 17:38:22 INFO util.NativeCodeLoader: Loaded the native-hadoop library
11/12/11 17:38:22 INFO mapred.FileInputFormat: Total input paths to process : 7
11/12/11 17:38:22 INFO mapred.JobClient: Running job: job_local_0001
11/12/11 17:38:22 INFO util.ProcessTree: setsid exited with exit code 0
11/12/11 17:38:22 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@e49dcd
11/12/11 17:38:22 INFO mapred.MapTask: numReduceTasks: 1
11/12/11 17:38:22 INFO mapred.MapTask: io.sort.mb = 100
11/12/11 17:38:22 WARN mapred.LocalJobRunner: job_local_0001
java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
11/12/11 17:38:23 INFO mapred.JobClient:  map 0% reduce 0%
11/12/11 17:38:23 INFO mapred.JobClient: Job complete: job_local_0001
11/12/11 17:38:23 INFO mapred.JobClient: Counters: 0
11/12/11 17:38:23 INFO mapred.JobClient: Job Failed: NA
java.io.IOException: Job failed!
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1257)
	at org.apache.hadoop.examples.Grep.run(Grep.java:69)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.examples.Grep.main(Grep.java:93)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
	at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Java Solutions


Solution 1 - Java

For anyone using RPM or DEB packages, the documentation and common advice is misleading. These packages install hadoop configuration files into /etc/hadoop. These will take priority over other settings.

The /etc/hadoop/hadoop-env.sh sets the maximum java heap memory for Hadoop, by Default it is:

   export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS"

This Xmx setting is too low, simply change it to this and rerun

   export HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS"

Solution 2 - Java

You can assign more memory by editing the conf/mapred-site.xml file and adding the property:

  <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1024m</value>
  </property>

This will start the hadoop JVMs with more heap space.

Solution 3 - Java

Another possibility is editing hadoop-env.sh, which contains export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS". Changing 128m to 1024m helped in my case (Hadoop 1.0.0.1 on Debian).

Solution 4 - Java

After trying so many combinations, finally I concluded the same error on my environment (Ubuntu 12.04, Hadoop 1.0.4) is due to two issues.

  1. Same as Zach Gamer mentioned above.
  2. don't forget to execute "ssh localhost" first. Believe or not! No ssh would throw an error message on Java heap space as well.

Solution 5 - Java

You need to make adjustments to mapreduce.{map|reduce}.java.opts and also to mapreduce.{map|reduce}.memory.mb.

For example:

  hadoop jar <jarName> <fqcn> \
      -Dmapreduce.map.memory.mb=4096 \
      -Dmapreduce.map.java.opts=-Xmx3686m

here is good resource with answer to this question

Solution 6 - Java

You can solve this problem by editting the file /etc/hadoop/hadoop-env.sh.

Hadoop was giving /etc/hadoop config directory precedence over conf directory.

I also met with the same situation.

Solution 7 - Java

We faced the same situation.

Modifying the hadoop-env.sh worked out for me.

EXPORT HADOOP_HEAPSIZE would be commented, uncomment that & provide the size of your choice.

By default HEAPSIZE assigned is 1000MB.

Solution 8 - Java

Run your job like the one below:

bin/hadoop jar hadoop-examples-*.jar grep -D mapred.child.java.opts=-Xmx1024M input output 'dfs[a-z.]+' 

The heap space, by default is set to 32MB or 64MB. You can increase the heap space in properties file as, Tudor pointed out, or you can change it for this particular job by setting this property for this particular job.

Solution 9 - Java

I installed hadoop 1.0.4 from the binary tar and had the out of memory problem. I tried Tudor's, Zach Garner's, Nishant Nagwani's and Andris Birkmanis's solutions but none of them worked for me.

Editing the bin/hadoop to ignore $HADOOP_CLIENT_OPTS worked for me:

...
elif [ "$COMMAND" = "jar" ] ; then
     CLASS=org.apache.hadoop.util.RunJar
    #Line changed this line to avoid out of memory error:
    #HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
    # changed to:
     HADOOP_OPTS="$HADOOP_OPTS "
...

I'm assuming that there is a better way to do this but I could not find it.

Solution 10 - Java

The same exception with Ubuntu, Hadoop 1.1.1. The solution was simple - edit shell variable $HADOOP_CLIENT_OPTS set by some init script. But it took long time to find it =(

Solution 11 - Java

Make sure the mapreduce.child.java.opts have sufficient memory required to run mapred job. Also ensure that mapreduce.task.io.sort.mb should be less than mapreduce.child.java.opts.

Example:

 mapreduce.child.java.opts=Xmx2048m

 mapreduce.task.io.sort.mb=100

Otherwise you'll hit the OOM issue even the HADOOP_CLIENT_OPTS in hadoop-env.sh have enough memory if configured.

Solution 12 - Java

Configure the JVM heap size for your map and reduce processes. These sizes need to be less than the physical memory you configured in the previous section. As a general rule, they should be 80% the size of the YARN physical memory settings.

Configure mapreduce.map.java.opts and mapreduce.reduce.java.opts to set the map and reduce heap sizes respectively, e.g.

<property>  
   <name>mapreduce.map.java.opts</name>  
   <value>-Xmx1638m</value>
</property>
<property>  
   <name>mapreduce.reduce.java.opts</name>  
   <value>-Xmx3278m</value>
</property>

Solution 13 - Java

Exporting the variables by running the following command worked for me:

. conf/hadoop-env.sh

Solution 14 - Java

On Ubuntu using DEB install (at least for Hadoop 1.2.1) there is a /etc/profile.d/hadoop-env.sh symlink created to /etc/hadoop/hadoop-env.sh which causes it to load every time you log in. In my experience this is not necessary as the /usr/bin/hadoop wrapper itself will eventually call it (through /usr/libexec/hadoop-config.sh). On my system I've removed the symlink and I no longer get weird issues when changing the value for -Xmx in HADOOP_CLIENT_OPTIONS (because every time that hadoop-env.sh script is run, the client options environment variable is updated, though keeping the old value)

Solution 15 - Java

I ended up with a very similar issue last week. My input file that I was using had a big ass line in it which I could not view. That line was almost 95% of my file size(95% of 1gb! imagine that!). I would suggest you take a look at your input files first. You might be having a malformed input file that you want to look into. Try increasing heap space after you check the input file.

Solution 16 - Java

If you are using Hadoop on Amazon EMR, a configuration can be added to increase the heap size:

[  {    "Classification": "hadoop-env",    "Properties": {},    "Configurations": [      {        "Classification": "export",        "Properties": {          "HADOOP_HEAPSIZE": "2048"        },        "Configurations": []
      }
    ]
  }
]

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAnujView Question on Stackoverflow
Solution 1 - JavaZach GarnerView Answer on Stackoverflow
Solution 2 - JavaTudorView Answer on Stackoverflow
Solution 3 - JavaAndris BirkmanisView Answer on Stackoverflow
Solution 4 - JavaetlolapView Answer on Stackoverflow
Solution 5 - JavatworecView Answer on Stackoverflow
Solution 6 - JavawufaweiView Answer on Stackoverflow
Solution 7 - JavaMitra BhanuView Answer on Stackoverflow
Solution 8 - JavaNishant NagwaniView Answer on Stackoverflow
Solution 9 - JavaBrian C.View Answer on Stackoverflow
Solution 10 - JavaOdysseusView Answer on Stackoverflow
Solution 11 - JavaS.K. VenkatView Answer on Stackoverflow
Solution 12 - JavaPravat SutarView Answer on Stackoverflow
Solution 13 - JavaSatyajit RaiView Answer on Stackoverflow
Solution 14 - JavaboriceView Answer on Stackoverflow
Solution 15 - JavaAdi KishView Answer on Stackoverflow
Solution 16 - JavaJay PrallView Answer on Stackoverflow