Even if capacity is high, snapshots should be removed in order to free up space.
This system starts off with 980 MB of DFS Used:
-bash-4.1$ hdfs dfsadmin -report Configured Capacity: 145835704320 (135.82 GB) Present Capacity: 113644720128 (105.84 GB) DFS Remaining: 112616583168 (104.88 GB) DFS Used: 1028136960 (980.51 MB) DFS Used%: 0.90% Under replicated blocks: 4244 Blocks with corrupt replicas: 0 Missing blocks: 0 Datanodes available: 1 (1 total, 0 dead) Live datanodes: Name: 127.0.0.1:50010 (localhost) Hostname: localhost Decommission Status : Normal Configured Capacity: 145835704320 (135.82 GB) DFS Used: 1028136960 (980.51 MB) Non DFS Used: 32190984192 (29.98 GB) DFS Remaining: 112616583168 (104.88 GB) DFS Used%: 0.70% DFS Remaining%: 77.22% Last contact: Wed Feb 10 00:28:34 CST 2016
The space used is distributed in this manner in HDFS:
-bash-4.1$ hdfs dfs -du -h / 4.5 K /apps 536.5 M /hawq_data 0 /hive 0 /mapred 286.4 M /retail_demo 0 /tmp 108.2 M /user 7.1 M /yarn -bash-4.1$
Observe that there is a snapshot on the /hawq_data/ directory:
-bash-4.1$ hdfs dfs -ls /hawq_data/.snapshot/ Found 1 items drwxr-xr-x - gpadmin hadoop 0 2016-02-10 00:07 /hawq_data/.snapshot/s20160210-000709.684 -bash-4.1$
The /hawq_data/ directory is deleted and then removed from the Trash:
-bash-4.1$ hdfs dfs -rm -R /hawq_data/gpseg0/ 16/02/10 00:31:25 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 86400000 minutes, Emptier interval = 0 minutes. Moved: 'hdfs://pivhdsne.localdomain:8020/hawq_data/gpseg0' to trash at: hdfs://pivhdsne.localdomain:8020/user/hdfs/.Trash/Current -bash-4.1$ hdfs dfs -rm -R /user/hdfs/.Trash/Current 16/02/10 00:31:55 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 86400000 minutes, Emptier interval = 0 minutes. Deleted /user/hdfs/.Trash/Current
The space used by /hawq_data/ now shows up as 0 MB:
hdfs://pivhdsne.localdomain:8020 135.8 G 980.5 M 104.9 G 1% -bash-4.1$ hdfs dfs -du -h / 4.5 K /apps 0 /hawq_data 0 /hive 0 /mapred 286.4 M /retail_demo 0 /tmp 108.2 M /user 7.1 M /yarn
However, the space used in DFS is still 980 MB:
-bash-4.1$ hdfs dfsadmin -report Configured Capacity: 145835704320 (135.82 GB) Present Capacity: 113644474368 (105.84 GB) DFS Remaining: 112616337408 (104.88 GB) DFS Used: 1028136960 (980.51 MB) DFS Used%: 0.90% Under replicated blocks: 4244 Blocks with corrupt replicas: 0 Missing blocks: 0 Datanodes available: 1 (1 total, 0 dead) Live datanodes: Name: 127.0.0.1:50010 (localhost) Hostname: localhost Decommission Status : Normal Configured Capacity: 145835704320 (135.82 GB) DFS Used: 1028136960 (980.51 MB) Non DFS Used: 32191229952 (29.98 GB) DFS Remaining: 112616337408 (104.88 GB) DFS Used%: 0.70% DFS Remaining%: 77.22% Last contact: Wed Feb 10 00:32:22 CST 2016
Only after removing the snapshot, does the "DFS Used" go down and the space is available once again:
-bash-4.1$ hdfs dfs -deleteSnapshot /hawq_data/ s20160210-000709.684 -bash-4.1$ hdfs dfs -ls /hawq_data/.snapshot/ -bash-4.1$ hdfs dfsadmin -report Configured Capacity: 145835704320 (135.82 GB) Present Capacity: 114780148015 (106.90 GB) DFS Remaining: 114319011840 (106.47 GB) DFS Used: 461136175 (439.77 MB) DFS Used%: 0.40% Under replicated blocks: 4229 Blocks with corrupt replicas: 0 Missing blocks: 0 Datanodes available: 1 (1 total, 0 dead) Live datanodes: Name: 127.0.0.1:50010 (localhost) Hostname: localhost Decommission Status : Normal Configured Capacity: 145835704320 (135.82 GB) DFS Used: 461136175 (439.77 MB) Non DFS Used: 31055556305 (28.92 GB) DFS Remaining: 114319011840 (106.47 GB) DFS Used%: 0.32% DFS Remaining%: 78.39% Last contact: Wed Feb 10 00:38:04 CST 2016 -bash-4.1$