Roland's homepage

My random knot in the Web

Testing compression speed and ratio

The compression speed and ratio of several compression programs is tested.

Hardware

These tests were originally run on a machine with an Intel Core2 Q9300 CPU and a 150 MB/s harddisk running FreeBSD 11.2. I’ve updated the figures for an Intel Core i7-7700 and a WDC WD4002FYYZ-01B7CB1 (600 MB/s transfers) harddisk. Data for the brotli compression program has been added.

Test data

There are two kinds of text data;

  • An mbox file of spam (42432 KiB of text)
  • a tarball of a LaTeX project (71680 KiB of text and binary data)

Tests

gzip

The gzip program was conceived as an alternative to the compress program which was under threat form corporations holding patents on the LZW algorithm used in compress.

The gzip program supports nine levels of compression. Level 1 is supposed to be the fastest, while level 9 is supposed to offer the best compression. Level 6 is the default All these levels are tested once:

foreach n (`seq 1 9`)
    /usr/bin/time gzip -c -${n} foo >foo_${n}.gz
end

Afterwards, the size of the compressed files is checked:

du foo_*.gz

From this, we calculate the compression speed (MiB/s) and ratio (%). The former is defined as original_size/time. The latter is defined as (1 - compressed_size/original_size)*100%. This is converted into a table.

Text Tarball
Level Speed [MB/s] Reduction [%] Level Speed [MB/s] Reduction [%]
1 61.85 51.73 1 47.95 6.12
2 59.20 52.34 2 47.95 6.16
3 53.12 52.94 3 47.62 6.21
4 49.92 54.07 4 47.30 6.34
5 45.54 54.68 5 46.67 6.38
6 40.62 54.98 6 45.75 6.38
7 39.84 55.05 7 44.87 6.38
8 32.89 55.13 8 41.67 6.43
9 28.78 55.13 9 33.98 6.43

As expected, the tarball doesn’t compress as well or as fast as the text file.

The default compression level is 6. A lot of alternative compression programs support the same options as gzip, which makes testing simpler.

The gzip test was also done on an M2 SSD. The times ware almost identical. So it seems that gzip is CPU bound, not I/O bound.

bzip2

The test is similar to that of gzip:

foreach n (`seq 1 9`)
    /usr/bin/time bzip2 -c -${n} foo >foo_${n}.bz2
end

The data is gathered and converted in the same way:

Text Tarball
Level Speed [MB/s] Reduction [%] Level Speed [MB/s] Reduction [%]
1 12.79 55.13 1 10.26 5.80
2 12.44 55.73 2 10.62 6.12
3 11.77 55.96 3 10.51 6.21
4 11.42 56.33 4 10.40 6.38
5 11.42 56.26 5 10.36 6.43
6 10.99 56.64 6 10.20 6.56
7 10.96 56.56 7 10.09 6.70
8 10.79 56.71 8 9.94 6.70
9 10.79 56.86 9 10.06 6.79

The compression ratio for text and tarball is slightly better with bzip2. But the compression speed is much inferior to that of gzip.

lzma

Again the test is similar:

foreach n (`seq 1 9`)
    /usr/bin/time lzma -c -${n} foo >foo_${n}.lzma
end

The resulting data:

Text Tarball
Level Speed [MB/s] Reduction [%] Level Speed [MB/s] Reduction [%]
1 11.87 57.01 1 5.86 7.10
2 7.37 57.69 2 3.33 7.14
3 5.77 58.37 3 2.80 7.19
4 4.74 60.56 4 2.96 7.32
5 4.03 62.14 5 2.65 7.59
6 3.63 62.37 6 2.62 7.59
7 3.59 63.27 7 2.58 9.38
8 3.37 63.35 8 2.81 20.85
9 3.32 63.35 9 2.63 22.37

Level 1 LZMA is better than level 9 bzip2. The size reduction for the tarball at compression level 8 or 9 is impressive.

brotli

The test command is slighty different in this case:

foreach n (`seq 1 11`)
    /usr/bin/time brotli -c -q ${n} foo >foo_${n}.br
end

The brotli program supports 11 compression levels, with level 11 being the default.

Text Tarball
Level Speed [MB/s] Reduction [%] Level Speed [MB/s] Reduction [%]
1 345.31 54.30 1 179.49 6.16
2 180.16 63.65 2 152.17 10.04
3 147.99 63.88 3 179.49 10.13
4 67.93 57.32 4 95.89 7.54
5 51.80 57.77 5 106.06 7.59
6 36.03 61.39 6 71.43 8.79
7 15.40 60.26 7 20.77 7.77
8 6.59 61.92 8 7.14 7.81
9 1.55 62.37 9 3.36 8.04
10 1.10 62.82 10 0.77 9.11
11 0.53 62.97 11 0.47 9.42

It is noteworthy that for the given data, level 3 gives the best size reduction for both datasets, and does so at blistering speed.

Conclusions

The corpus of text can be reduced in size by slightly more than half. The baseline, gzip at its default setting, reduces the size by 55%. The best compression (lzma and brotli) reach 63%. The difference between worst and best compression can come at a huge computational cost. For example, lzma at its best is 11x slower than gzip.

For the tarball, somewhat surprisingly lzma takes the crown w.r.t. size reduction.

The new contender brotli is a surprise. At compression level 2 or 3 it both is significantly faster than gzip and reaches the highest size reduction for text and the second highest size reduction for a tarball.


For comments, please send me an e-mail.


Related articles


←  Storing and graphing time-based data Does repeated JPEG conversion reduce quality?  →