Deduplication and compression capacity saving estimation
search cancel

Deduplication and compression capacity saving estimation

book

Article ID: 387348

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This KB article is written to address Deduplication and compression capacity savings estimation related queries on a vSAN cluster

Environment

VMware vSAN 7.0.x

VMware vSAN 8.0.x

Resolution

Because the domain for deduplication is at the disk group level, a smaller number of large disk groups typically yield higher overall deduplication ratios than a larger number of smaller disk groups.

The disadvantage of having a smaller number of large disk groups is less write-buffer capacity relative to disk group size and more data migration and resync traffic during maintenance operations (disk replacement, failure).

If application-based (e.g., database) native compression is used, vSAN compression may provide reduced benefits.

The space saving obtained due to deduplication and compression is highly dependent on the application workload and data set composition.

therefore it is very difficult to estimate what the Deduplication and Compression savings would be without actually turning the feature on, since the vSAN storage solution deduplication and compression feature provides a varying amount of reduction in capacity depending upon the workload and vSAN disk group configuration.

Additional Information

FYI Dedup and Compression can be enabled on an existing vSAN All Flash OSA Cluster with a valid license that support this feature provided you have enough free space (e.g at least 30% free capacity)

For more information please see  this  document Enable Deduplication and Compression on Existing vSAN  Cluster  

Also other option would be just using compression only which will still give you some space savings but it all depends upon data type.

for more information please also see 

How much space savings can one expect using the "Compression only" feature? The answer to this depends on the workload, and the type of data being stored. Both of the DD&C and "Compression only" features are opportunistic, which means that space savings are not guaranteed. 

What will the levels of performance be like when using the "Compression only" feature? This will land somewhere in between the performance of your hosts not running any space efficiency, and the performance of your hosts running DD&C.