Namenode fails with "java.lang.AssertionError: Should not purge more edits than required to restore"
search cancel

Namenode fails with "java.lang.AssertionError: Should not purge more edits than required to restore"

book

Article ID: 294873

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

Symptoms:

Namenode startup failed with a FATAL message stating: "org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.lang.AssertionError: Should not purge more edits than required to restore"

Below is the complete stack of the error message:

2014-02-14 21:28:58,108 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 6326 saved in 0 seconds.
2014-02-14 21:28:58,652 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to retain 2 images with txid >= 611
2014-02-14 21:28:58,702 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Purging old image FSImageFile(file=/data/nn/dfs/name/current/fsimage_0000000000000000575, cpktTxId=0000000000000000575)
2014-02-14 21:29:04,411 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.lang.AssertionError: Should not purge more edits than required to restore: 10179 should be <= 612
 at org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:132)
 at org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:946)
 at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImageInAllDirs(FSImage.java:931)

 

Environment


Cause

When Namenode starts, it initializes namespace image from an already stored checkpoint, and then replays edits logs to create a fresh namespace and creates a new checkpoint. In case, when there is a large number of edits files which are to be replayed to create namespace information, you might face an error as indicated above. Such situation may occur in scenarios like below, but are not limited to these only :

- Secondary Namenode has not been configured, and namenode has been running since a long time resulting in a huge number of edit logs.
- Secondary Namenode has been configured but either it's not running or it is having issues while performing checkpoint operation.

 

Due to bug https://issues.apache.org/jira/browse/HDFS-4739, a limit on the total number of extra edit log segments is forced, due to which if the number of required segments is greater than the configured limit, such an error can be seen.

Resolution

You can set the value of dfs.namenode.max.extra.edits.segments.retained to a higher value (ex: 1500000) in /etc/gphd/hadooop/conf/hdfs-site.xml at namenode and restart it. After, all the operations are finished successfully and can remove this property and restart the namenode.

Also, to note, before namenode dies with such an error, it saves the new fsimage file which has been created by replaying the edits logs.

"dfs.namenode.max.extra.edits.segments.retained": It controls the maximum number of extra edit log segments which should be retained beyond what is minimally necessary for a namenode restart.