Evaluating Zstandard compression
Recently I became aware of the zstd compression program. I wanted to see how
it stacks up against gzip
, bzip2
and xz
.
TL;DR
Compression with zstd
is blisteringly fast.
But xz
still yields the smallest files.
Test setup
The tests are done on an otherwise idle machine. The test machine is from 2009 and has a core2 quad CPU and a regular SATA harddisk. So not the latest and greatest, but no slouch.
All programs are used with their default settings and single-threaded.
Plain text
The first demo file is a 46 MiB mail log file. This is plain text, so it should compress well.
First let’s compress the file and look at execution times. Each test is run three times. The last run is shown below. After that we look at the file sizes.
> /usr/bin/time gzip -k maillog.txt
1.15 real 1.12 user 0.03 sys
> /usr/bin/time bzip2 -k maillog.txt
11.24 real 11.21 user 0.02 sys
> /usr/bin/time xz -k maillog.txt
22.55 real 22.43 user 0.11 sys
> /usr/bin/time zstd -q -k maillog.txt
0.29 real 0.26 user 0.03 sys
> du maillog.txt*|sort -rn
46880 maillog.txt
5184 maillog.txt.gz
4736 maillog.txt.zst
3584 maillog.txt.bz2
3392 maillog.txt.xz
The speed of zstd
is really amazing. The compression of zstd
does not
live up to xz
and bzip2
. But it beats gzip
comfortably.
Mixed Tarball
Next is a tar file which contains the source code of a large TeX document with
all its images, graphs et cetera complete with the complete git
history.
This is a mix of text and binary files. It will probably not compress terribly well.
Again, we observe compression times first, then file sizes.
> /usr/bin/time gzip -k backup-logbook2016.tar
6.90 real 6.77 user 0.12 sys
> /usr/bin/time bzip2 -k backup-logbook2016.tar
33.47 real 33.25 user 0.18 sys
> /usr/bin/time xz -k backup-logbook2016.tar
67.58 real 66.90 user 0.65 sys
> /usr/bin/time zstd -q -k backup-logbook2016.tar
1.67 real 1.03 user 0.16 sys
> du backup-logbook2016.tar*|sort -rn
133056 backup-logbook2016.tar
125440 backup-logbook2016.tar.bz2
125152 backup-logbook2016.tar.gz
124608 backup-logbook2016.tar.zst
121856 backup-logbook2016.tar.xz
As expected it is difficult to compress this file significantly. Interesting
is that zstd
does well here reaching second place. It is again the
fastest by far.
Code tarball
Next test is the tarball for gcc-4.9.4
, which is first unpacked from its
bzipped form. Given the huge size of this tarball this test is only run once.
> /usr/bin/time gzip -k gcc-4.9.4.tar
25.79 real 25.41 user 0.36 sys
> /usr/bin/time bzip2 -k gcc-4.9.4.tar
80.68 real 80.24 user 0.40 sys
> /usr/bin/time xz -k gcc-4.9.4.tar
291.59 real 290.14 user 1.37 sys
> /usr/bin/time zstd -q -k gcc-4.9.4.tar
6.15 real 5.73 user 0.40 sys
> du gcc-4.9.4.tar*|sort -rn
566816 gcc-4.9.4.tar
114336 gcc-4.9.4.tar.gz
108384 gcc-4.9.4.tar.zst
88032 gcc-4.9.4.tar.bz2
69952 gcc-4.9.4.tar.xz
Again zstd
shows amazing speed and better compression then gzip
.
Zstd with maximum compression level
Let’s see what happens when we use zstd
with its maximum regular
compression setting.
> /usr/bin/time zstd -q -k -19 maillog.txt
23.43 real 23.35 user 0.07 sys
> /usr/bin/time zstd -q -k -19 backup-logbook2016.tar
46.77 real 46.39 user 0.33 sys
> /usr/bin/time zstd -q -k -19 gcc-4.9.4.tar
252.21 real 251.68 user 0.45 sys
> du *.zst|sort -rn
122016 backup-logbook2016.tar.zst
74048 gcc-4.9.4.tar.zst
3936 maillog.txt.zst
The speed is much reduced in this case. The compressed files are smaller than
those made with the standard settings. But they don’t beat xz
yet.
Conclusion
It is probably time to retire gzip
. In all test cases zstd
in its default
settings compresses much faster and yields a smaller file.
With regard to file size, xz
is still king of the hill. On its best
compression settings, zstd
can come close to the performance of xz
,
but the latter still has the edge on file size.
For comments, please send me an e-mail.