Secondary NameNode checkpoint error "Inconsistent checkpoint fields"
search cancel

Secondary NameNode checkpoint error "Inconsistent checkpoint fields"

book

Article ID: 294733

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

Symptoms:

When the cluster is deployed using NameNode High Availability instead of Secondary NameNode, Secondary NameNode checkpoints fails producing the following error message: 

java.io.IOException: Inconsistent checkpoint fields.
LV = -63 namespaceID = 713175558 cTime = 0 ; clusterId = CID-f2caf2b4-b3da-4a34-a62f-fea8badc724e ; blockpoolId = BP-716932340-192.168.1.35-1470839146699.
Expecting respectively: -63; 1785058013; 0; CID-4f13424a-2eb8-43d9-9ae1-9df54670f489; BP-1667795605-192.168.1.35-1465693123284.
at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:134)
 at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:531)
 at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:395)
 at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:361)
 at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:449)
 at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:357)
 at java.lang.Thread.run(Thread.java:745)

Note: These errors show up in either the Secondary NameNode's .log file or its .out file.

Environment


Cause

There are two different causes for this error message:

1. A failed or improper upgrade procedure has been executed.

2. The Secondary NameNode has an incorrect ${dfs.namenode.checkpoint.dir}/current/VERSION file. In this scenario, everything under the SNN's ${dfs.namenode.checkpoint.dir} directory needs to be wiped out and rebuilt again so that checkpointing works properly.

Resolution

Follow these steps to resolve this issue:


1. Through Ambari UI, select HDFS Service > Configs.

2. Identify the directory value for parameters, dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir.

3. Note the values down. NOTE: If the value of dfs.namenode.checkpoint.edits.dir uses a different directory value than dfs.namenode.checkpoint.dir, you must repeat steps 10, 11 and 12 for that directory as well.

4. Stop all services except HDFS.

5. On the Primary NameNode host, put HDFS in SafeMode: hdfs dfsadmin-safemode

6. On the same host, confirm SafeMode: hdfs dfsadmin -safemode get.

7. On the same host, checkpoint the NameSpace: hdfs dfsadmin -saveNamespace.

8. While still in SafeMode, shutdown remaining HDFS service(s).

9. Log in to the Secondary NameNode host.

10. cd to the value of ${dfs.namenode.checkpoint.dir}.

11. mv current current.bad.

12. Start up HDFS service(s) only.

13. Wait for HDFS services to come online.

14. Start the remaining Hadoop Services.