The hdfs "stat" command is useful when you need to write a quick script that will collect specific information about the files within HDFS.
Use case: When you run hdfs -ls /filename it will always return the full path of the file, but you just need to pull out the basename.
After reading this article you will know how to print only the file or directory name and certain relevant details about that file.
Formatting options:
%b Size of file in bytes %F Will return "file", "directory", or "symlink" depending on the type of inode %g Group name %n Filename %o HDFS Block size in bytes ( 128MB by default ) %r Replication factor %u Username of owner %y Formatted mtime of inode %Y UNIX Epoch mtime of inode
Example: Use stat to return only basename confirming file or direcotry exists in HDFS
[root@hdm1 ~]# hdfs dfs -stat "%n" /tmp/messages messages
Example: Compare all stat attributes with "ls"
[root@hdm1 ~]# hdfs dfs -stat "%b %F %g %n %o %r %u %y %Y" /tmp/messages 143 regular file hadoop messages 134217728 3 root 2014-02-07 21:17:22 1391807842674
Compared with "-ls"
Found 1 items -rw-r--r-- 3 root hadoop 143 2014-02-07 13:17 /tmp/messages
Example: Use stat with a directory
[root@hdm1 ~]# hdfs dfs -stat "%b %F %g %n %o %r %u %y %Y" /tmp/gphdtmp 0 directory hadoop gphdtmp 0 0 hdfs 2013-12-26 07:08:06 1388041686026
Example: Performing a stat on all files and directories under /tmp
[root@hdm1 ~]# hdfs dfs -stat "%b %F %g %n %o %r %u %y %Y" "/tmp/*" 0 directory hadoop gphdtmp 0 0 hdfs 2013-12-26 07:08:06 1388041686026 143 regular file hadoop messages 134217728 3 root 2014-02-07 21:17:22 1391807842674