Large hive query fails with "Container killed on request"
search cancel

Large hive query fails with "Container killed on request"

book

Article ID: 294894

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

Symptoms:

In this case, the application master suddenly reports 33k "Container killed on request" messages to stdout.

Container killed on request. Exit code is 143

Container killed on request. Exit code is 143

Job failed as tasks failed. failedMaps:0 failedReduces:1

Container logs are misleading and report random error conditions, such as "file does not exist".

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/hive-svcckp/hive_2015-04-03_22-17-06_046_7422088098942342020-1/_task_tmp.-ext-10000/base_div_nbr=1/retail_channel_code=1/year_nbr=2013/qtr_nbr=2/visit_date=2013-07-05/_tmp.001367_0: File does n

ot exist. Holder DFSClient_attempt_1426272186088_146151_r_001367_0_-2103711889_1 does not have any open files.

        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2932)

        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2996)

        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2978)

        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)

        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:434)

        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:63013)

        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)


......

Note: Reducer reports errors while processing a row strace (Note key/value message truncated).

2015-04-03 21:09:18,484 FATAL [main] ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{....},"value":{...}}

        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:258)

        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:462)

        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)

        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)

        at java.security.AccessController.doPrivileged(Native Method)

:



        at java.lang.Thread.run(Thread.java:744)

Caused by: org.apache.hadoop.util.Shell$ExitCodeException:

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)

        at org.apache.hadoop.util.Shell.run(Shell.java:379)

        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)

        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:311)

        ... 6 more


        

Environment


Cause

You will be running a job that normally succeeds, however in this case, the dataset size increased by a factor of 3.

This results in the application master launching over 30k containers but was using the default memory size of 2gb.

This results in lots of GC pauses and the application master was simply not able to keep up with the number of containers.

Resolution

Using mapreduce param "yarn.app.mapreduce.am.resource.mb" to increase the AM memory to a higher value. You can review this KB article for determining the best value of this parameter.