YARN Applications shows errors
"YarnRuntimeException: Could not load history file" and
"Caused by: java.io.FileNotFoundException: File not found:"
The Job History Server reports Java exceptions like the following:
2017-04-03 02:10:35,177 WARN org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load history file hdfs://<NAMENODEHOST>:8020/user/history/done_intermediate/<USERNAME>/job_<CLUSTERSTAMP>_<JOBNUM>-<ID>-<USERNAME>-<APPNAME>-<STRING>-<STRING>-0-SUCCEEDED-default.jhist at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.loadFullHistoryData(CompletedJob.java:339) at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.<init>(CompletedJob.java:101) at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo.loadJob(HistoryFileManager.java:408) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.loadJob(CachedHistoryStorage.java:86) at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:112) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getJob(JobHistory.java:207) at org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices.getJobFromJobIdString(AMWebServices.java:120) at org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices.getJob(HsWebServices.java:196) . . . <snip> . . . Caused by: java.io.FileNotFoundException: File not found: /user/history/done_intermediate/<USERNAME>/job_<CLUSTERSTAMP>_<JOBNUM>-<ID>-<USERNAME>-<APPNAME>-<STRING>-<STRING>-0-SUCCEEDED-default.jhist
In HDFS, the USER's "done_intermediate" directory had the following ownership and permission modes:
$ hdfs dfs -ls /user/history/done_intermediate ... drwxrwx--- - <USERNAME> hadoop 0 2017-04-03 14:50 /user/history/done_intermediate/<USERNAME> $
And in HDFS, the actual "done_intermediate" directory had the following ownership, permissions and also, it had the Sticky bit set:
$ hdfs dfs -ls /user/history/ ... drwxrwxrwx - mapred hadoop 0 2017-03-07 10:53 /user/history/done drwxrwxrwt - mapred hadoop 0 2016-12-17 06:28 /user/history/done_intermediate $
Unset/remove the Sticky bit on the HDFS done_intermediate directory:
$ hdfs dfs -chmod 0777 /user/history/done_intermediate