2014-04-22 13:00:59,898 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing replica BP-1306430579-172.28.9.250-1381221906808:-8712134517697604346 on failed volume /data2/dfs/current
2014-04-22 13:00:59,898 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removed 12308 out of 123402(took 138 millisecs)
2014-04-22 13:00:59,898 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode.handleDiskError: Keep Running: false
2014-04-22 13:01:00,110 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is shutting down: DataNode failed volumes:/data2/dfs/current;
2014-04-22 13:01:00,112 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:svc-platfora (auth:SIMPLE) cause:java.io.IOException: Block blk_2910942244825575033_338680521 is not valid.
2014-04-22 13:01:00,112 INFO org.apache.hadoop.ipc.Server: IPC Server handler 50 on 50020, call org.apache.hadoop.hdfs.protocol.ClientDatanodeProtocol.getBlockLocalPathInfo from 172.28.10.40:55874: error: java.io.IOException: Block blk_2910942244825575033_338680521 is not valid.
java.io.IOException: Block blk_2910942244825575033_338680521 is not valid.
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:306)
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:287)
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockLocalPathInfo(FsDatasetImpl.java:1737)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockLocalPathInfo(DataNode.java:1023)
at org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolServerSideTranslatorPB.getBlockLocalPathInfo(ClientDatanodeProtocolServerSideTranslatorPB.java:112)
at org.apache.hadoop.hdfs.protocol.proto.ClientDatanodeProtocolProtos$ClientDatanodeProtocolService$2.callBlockingMethod(ClientDatanodeProtocolProtos.java:5104)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)
By default, ICM client will configure the hdfs-site.xml parameter "dfs.datanode.failed.volumes.tolerated" to 0 which will force the datanode daemon to shutdown in the event of a failure accessing one of its defined data volumes. The data volumes are defined by the param "dfs.datanode.data.dir" and in this case is set to use the following data volumes:
<property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>0</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data1/dfs,/data2/dfs,/data3/dfs</value>
</property>
/data2 data volume became inaccessible and datanode shutdown as a result. Typically, the data volume will be associated with a single disk that is configured with raid 0 so whatever data existed on that volume is lost.dfs.replication" so chances are there are 2 safe and sound copies somewhere else in the cluster that the application can read from.Replaced any failed disks associated with /data2 volume and recreate the data directory structure as defined by dfs.datanode.data.dir.
mkdir /data2/dfschown hdfs:hadoop /data2/dfshadoop-hdfs-datanode startYou can increase the dfs.datanode.failed.volumes.tolerated parameter to 1 and start the datanode service. This will prevent the datanode from shutting down when a single data volume fails.
NOTE: It is not recommended to increase this value if you have a datanode with 4 or less volumes or if your hardware is not being monitored for disk drive failures. You may experience dataloss if you have individual volume failures on spread across multiple datanodes and no alerts in place to detect failed data volumes.
<property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>0</value>
</property>
hadoop-hdfs-datanode start
<property>
<name>dfs.datanode.data.dir</name>
<value>/data1/dfs,/data2/dfs,/data3/dfs</value>
</property>
Change To:
<property>
<name>dfs.datanode.data.dir</name>
<value>/data1/dfs,/data3/dfs</value>
</property>
hadoop-hdfs-datanode startdfs is healthy with "sudo -u hdfs hdfs dfsadmin -report"