YARN Application Error: "YarnRuntimeException: Could not load history file" And "Caused by: java.io.FileNotFoundException: File not found:"
search cancel

YARN Application Error: "YarnRuntimeException: Could not load history file" And "Caused by: java.io.FileNotFoundException: File not found:"

book

Article ID: 295023

calendar_today

Updated On: 11-07-2018

Products

Services Suite

Issue/Introduction

Symptoms:

YARN Applications shows errors

"YarnRuntimeException: Could not load history file" and

"Caused by: java.io.FileNotFoundException: File not found:"

The Job History Server reports Java exceptions like the following:

2017-04-03 02:10:35,177 WARN org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load history file hdfs://<NAMENODEHOST>:8020/user/history/done_intermediate/<USERNAME>/job_<CLUSTERSTAMP>_<JOBNUM>-<ID>-<USERNAME>-<APPNAME>-<STRING>-<STRING>-0-SUCCEEDED-default.jhist
        at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.loadFullHistoryData(CompletedJob.java:339)
        at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.<init>(CompletedJob.java:101)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo.loadJob(HistoryFileManager.java:408)
        at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.loadJob(CachedHistoryStorage.java:86)
        at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:112)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getJob(JobHistory.java:207)
        at org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices.getJobFromJobIdString(AMWebServices.java:120)
        at org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices.getJob(HsWebServices.java:196)
.
.
.
<snip>
.
.
.
Caused by: java.io.FileNotFoundException: File not found: /user/history/done_intermediate/<USERNAME>/job_<CLUSTERSTAMP>_<JOBNUM>-<ID>-<USERNAME>-<APPNAME>-<STRING>-<STRING>-0-SUCCEEDED-default.jhist
 

Environment


Cause

In HDFS, the USER's "done_intermediate" directory had the following ownership and permission modes:

$ hdfs dfs -ls /user/history/done_intermediate
...
drwxrwx---   - <USERNAME>    hadoop          0 2017-04-03 14:50 /user/history/done_intermediate/<USERNAME>
$ 

And in HDFS, the actual "done_intermediate" directory had the following ownership, permissions and also, it had the Sticky bit set:

$ hdfs dfs -ls /user/history/
...
drwxrwxrwx   - mapred hadoop          0 2017-03-07 10:53 /user/history/done
drwxrwxrwt   - mapred hadoop          0 2016-12-17 06:28 /user/history/done_intermediate
$ 
 

Resolution

Unset/remove the Sticky bit on the HDFS done_intermediate directory:
$ hdfs dfs -chmod 0777 /user/history/done_intermediate