ZSTD Compression Metrics
search cancel

ZSTD Compression Metrics

book

Article ID: 404122

calendar_today

Updated On:

Products

VMware Tanzu Data Suite VMware Tanzu Greenplum VMware Tanzu Greenplum / Gemfire

Issue/Introduction

Results of a ZSTD compression benchmark and step-by-step guide on running a benchmark on Linux based architecture 

Environment

Greenplum x.x
 
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               106
Model name:          Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
Stepping:            6
CPU MHz:             1995.312
BogoMIPS:            3990.62
Hypervisor vendor:   VMware
Virtualization type: full
L1d cache:           48K
L1i cache:           32K
L2 cache:            1280K
L3 cache:            43008K
NUMA node0 CPU(s):   0-3
 
MemTotal:        5817008 kB
MemFree:          193644 kB
MemAvailable:     232856 kB
Buffers:              20 kB
Cached:           412880 kB
SwapCached:        78292 kB
Active:           547576 kB
Inactive:         991236 kB
Active(anon):     374556 kB
Inactive(anon):   938152 kB
Active(file):     173020 kB
Inactive(file):    53084 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       6213628 kB
SwapFree:        4768980 kB
Dirty:                16 kB
Writeback:             0 kB
AnonPages:       1059608 kB
Mapped:            63268 kB
Shmem:            186796 kB
KReclaimable:      63700 kB
Slab:             149756 kB
SReclaimable:      63700 kB
SUnreclaim:        86056 kB
KernelStack:        8256 kB
PageTables:        39332 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     9122132 kB
Committed_AS:    4252028 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       39716 kB
VmallocChunk:          0 kB
Percpu:             2384 kB
HardwareCorrupted:     0 kB
AnonHugePages:    622592 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:     1058236 kB
DirectMap2M:     5232640 kB
DirectMap1G:     2097152 kB

Cause

Zstandard (zstd), developed by Facebook, is a modern, fast compression algorithm that balances high compression ratios with fast decompression speeds. To assess zstd’s performance, a common benchmark file used is enwik8—a 100MB snapshot of Wikipedia content often used in compression research. By running zstd against enwik8 at various compression levels, one can measure compression ratio, speed (in MB/s), and resource usage to understand how zstd performs under different configurations and workloads.

The enwik8 test is a standard benchmark used in evaluating compression algorithms. It consists of the first 100 million bytes (100MB) of the English Wikipedia XML dump and is part of the Large Text Compression Benchmark (LTCB). This dataset is widely used because:

  • It is real-world, highly structured, and multilingual text data.

  • It contains a mix of common words, markup, metadata, and varying sentence structures.

  • It provides a meaningful test for how well algorithms handle textual redundancy, dictionary effectiveness, and speed vs. compression trade-offs.

Because enwik8 is standardized and publicly available, it allows for fair comparisons between different compression algorithms (e.g., Zstandard, gzip, bzip2, LZMA). In benchmarking, it's typically compressed at various algorithm levels (e.g., zstd -b1 -e22 enwik8) to observe:

  • Compression ratio (compressed size / original size)

  • Compression and decompression speed

  • CPU usage and memory requirements

This makes it a practical, repeatable benchmark for assessing how well an algorithm like zstd performs on structured, large-scale text data.

Resolution

Results of zstd test. Note that results may vary based on system architecture:

Columns: (Compression Level#File, Original File Size, Compressed File Size (Compression Ratio), Compression Speed/second, Decompression Speed/second)

zstd -b1 -e22 enwik8

1#enwik8            : 100000000 ->  40738526 (2.455), 155.2 MB/s , 523.5 MB/s

2#enwik8            : 100000000 ->  37434671 (2.671), 122.4 MB/s , 362.2 MB/s

3#enwik8            : 100000000 ->  35602698 (2.809),  94.8 MB/s , 394.5 MB/s

4#enwik8            : 100000000 ->  34920953 (2.864),  77.5 MB/s , 344.1 MB/s

5#enwik8            : 100000000 ->  34315441 (2.914),  23.9 MB/s , 273.9 MB/s

6#enwik8            : 100000000 ->  33591155 (2.977),  22.9 MB/s , 348.5 MB/s

7#enwik8            : 100000000 ->  32443027 (3.082),  16.0 MB/s , 384.5 MB/s

8#enwik8            : 100000000 ->  32029317 (3.122),  8.20 MB/s , 398.8 MB/s

9#enwik8            : 100000000 ->  31751130 (3.149), 10.22 MB/s , 404.2 MB/s

10#enwik8            : 100000000 ->  31238385 (3.201),  4.91 MB/s , 378.2 MB/s

11#enwik8            : 100000000 ->  30968326 (3.229),  5.77 MB/s , 354.5 MB/s

12#enwik8            : 100000000 ->  30728213 (3.254),  2.20 MB/s , 198.6 MB/s

13#enwik8            : 100000000 ->  30331609 (3.297),  2.66 MB/s , 369.9 MB/s

14#enwik8            : 100000000 ->  29794982 (3.356),  1.78 MB/s , 336.8 MB/s

15#enwik8            : 100000000 ->  29436073 (3.397),  1.57 MB/s , 314.4 MB/s

16#enwik8            : 100000000 ->  28446554 (3.515),  1.29 MB/s , 250.8 MB/s

17#enwik8            : 100000000 ->  27702787 (3.610),  0.99 MB/s , 205.9 MB/s

18#enwik8            : 100000000 ->  27326331 (3.659),  0.85 MB/s , 280.4 MB/s

19#enwik8            : 100000000 ->  26957405 (3.710),  0.78 MB/s , 162.7 MB/s

20#enwik8            : 100000000 ->  25989368 (3.848),  0.62 MB/s , 297.9 MB/s

21#enwik8            : 100000000 ->  25541097 (3.915),  0.50 MB/s , 309.7 MB/s

22#enwik8            : 100000000 ->  25340452 (3.946),  0.46 MB/s , 221.3 MB/s

 
To replicate benchmark test:

On Ubuntu:

sudo apt update
sudo apt install zstd

On RHEL:

sudo dnf install zstd
# or for older systems:
sudo yum install zstd

Download enwik8:

wget http://mattmahoney.net/dc/enwik8.zip

unzip enwik8.zip

 

Run benchmark:

zstd -b1 -e22 enwik8

  • -b1 -e22 tests compression levels 1 to 22

Additional Information

https://facebook.github.io/zstd/

https://github.com/facebook/zstd