In this case, the application master suddenly reports 33k "Container killed on request" messages to stdout.
Container killed on request. Exit code is 143 Container killed on request. Exit code is 143 Job failed as tasks failed. failedMaps:0 failedReduces:1
Container logs are misleading and report random error conditions, such as "file does not exist".
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/hive-svcckp/hive_2015-04-03_22-17-06_046_7422088098942342020-1/_task_tmp.-ext-10000/base_div_nbr=1/retail_channel_code=1/year_nbr=2013/qtr_nbr=2/visit_date=2013-07-05/_tmp.001367_0: File does n ot exist. Holder DFSClient_attempt_1426272186088_146151_r_001367_0_-2103711889_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2932) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2996) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2978) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:434) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:63013) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) ......
Note: Reducer reports errors while processing a row strace (Note key/value message truncated).
2015-04-03 21:09:18,484 FATAL [main] ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{....},"value":{...}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:258) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:462) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) : at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:311) ... 6 more