User attempts to execute a hive query with a large hash join and fails with the following error
rg.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2015-08-24 09:22:56 Processing rows: 1300000 Hashtable size: 1299999 Memory usage: 1844338856 percentage: 0.966 at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:91) at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:251) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:404) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:375) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:341) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:744) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
A google search for this error will tell you to increase the hive.mapred.local.mem but container logs show the max heapsize is still only 2GB for the local JVM process that gets launched by map task despite the value set by hive.mapred.local.mem
2015-08-24 07:01:50 Starting to launch local task to process map join; maximum memory = 1908932608
When executing a hash join the hive map task will launch a new JVM using "hadoop jar" command to spin up the "ExecDriver" main class
13868 [main] INFO org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask - Executing: /usr/bin/hadoop jar /u/applic/data/hdfs1/yarn/nm-local-dir/filecache/119/hive-exec-0.14.0.3.0.0.0-249.jar org.apache.hadoop.hive.ql.exec.mr.ExecDriver -localtask -plan file:/u/applic/data/hdfs9/yarn/nm-local-dir/usercache/mucha1/appcache/application_1440185851071_29920/container_1440185851071_29920_01_000002/tmp/mucha1/67d4be51-0b6f-486c-bf83-38fe8ecd4356/hive_2015-08-25_21-31-50_516_3577978971253935151-1/-local-10005/plan.xml -jobconffile file:/u/applic/data/hdfs9/yarn/nm-local-dir/usercache/mucha1/appcache/application_1440185851071_29920/container_1440185851071_29920_01_000002/tmp/mucha1/67d4be51-0b6f-486c-bf83-38fe8ecd4356/hive_2015-08-25_21-31-50_516_3577978971253935151-1/-local-10006/jobconf.xml 2015-08-25 21:31:54,416 INFO [main] mr.MapredLocalTask (MapredLocalTask.java:executeInChildVM(286)) - Executing: /usr/bin/hadoop jar /u/applic/data/hdfs1/yarn/nm-local-dir/filecache/119/hive-exec-0.14.0.3.0.0.0-249.jar org.apache.hadoop.hive.ql.exec.mr.ExecDriver -localtask -plan file:/u/applic/data/hdfs9/yarn/nm-local-dir/usercache/mucha1/appcache/application_1440185851071_29920/container_1440185851071_29920_01_000002/tmp/mucha1/67d4be51-0b6f-486c-bf83-38fe8ecd4356/hive_2015-08-25_21-31-50_516_3577978971253935151-1/-local-10005/plan.xml -jobconffile file:/u/applic/data/hdfs9/yarn/nm-local-dir/usercache/mucha1/appcache/application_1440185851071_29920/container_1440185851071_29920_01_000002/tmp/mucha1/67d4be51-0b6f-486c-bf83-38fe8ecd4356/hive_2015-08-25_21-31-50_516_3577978971253935151-1/-local-10006/jobconf.xml
Even though hive will set environmental variable "HADOOP_HEAPSIZE" to the value defined in hive.mapred.local.mem before launching the JVM task the "/usr/bin/hadoop" command will override the current HADOOP_HEAPSIZE settings
Source from./ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java that shows hive does set HADOOP_HEAPSIZE when hive.mapred.local.mem is set
int hadoopMem = conf.getIntVar(HiveConf.ConfVars.HIVEHADOOPMAXMEM); if (hadoopMem == 0) { // remove env var that would default child jvm to use parent's memory // as default. child jvm would use default memory for a hadoop client variables.remove(HADOOP_MEM_KEY); } else { // user specified the memory for local mode hadoop run console.printInfo(" set heap size\t" + hadoopMem + "MB"); variables.put(HADOOP_MEM_KEY, String.valueOf(hadoopMem)); } HiveConf.java:482: HIVEHADOOPMAXMEM("hive.mapred.local.mem", 0),
"/usr/bin/hadoop" command sources /etc/hadoop/conf/hadoop-env.sh which wipes out any existing HADOOP_HEAPSIZE settings already in the environment. This results in param hive.mapred.local.mem having no effect
# The maximum amount of heap to use, in MB. Default is 1000. export HADOOP_HEAPSIZE="2048"