Roland's homepage

My random knot in the Web

Testing compression speed and ratio

The compression speed and ratio of several compression programs is tested.

Hardware

These tests were originally run on a machine with an Intel Core2 Q9300 CPU and a 150 MB/s harddisk running FreeBSD 11.2. I’ve updated the figures for an Intel Core i7-7700 and a WDC WD4002FYYZ-01B7CB1 (600 MB/s transfers) harddisk.

Test data

An old logfile is used as the test data. It contains 48992 KiB of text. That is probably the best material to compress.

Tests

gzip

The gzip program was conceived as an alternative to the compress program which was under threat form corporations holding patents on the LZW algorithm used in compress.

The gzip program supports nine levels of compression. Level 1 is supposed to be the fastest, while level 9 is supposed to offer the best compression. All these levels are tested once:

foreach n (`seq 1 9`)
    /usr/bin/time gzip -c -${n} foo >foo_${n}.gz
end

Afterwards, the size of the compressed files is checked:

du foo_*.gz

From this, we calculate the compression speed (MiB/s) and ratio (%). The former is defined as original_size/time. The latter is defined as (1 - compressed_size/original_size)*100%. This is converted into a table.

Level Speed [MB/s] Ratio [%]
1 213.01 86.74
2 195.97 86.94
3 195.97 87.13
4 144.09 88.24
5 125.62 88.5
6 96.06 88.7
7 84.47 88.96
8 55.05 89.29
9 54.44 89.29

A lot of alternative compression programs support the same options as gzip, which makes testing simpler.

bzip2

The test is similar to that of gzip:

foreach n (`seq 1 9`)
    /usr/bin/time bzip2 -c -${n} foo >foo_${n}.bz2
end

The data is gathered and converted in the same way:

Level Speed [MB/s] Ratio [%]
1 13.31 90.01
2 12.13 90.99
3 11.24 91.38
4 10.63 91.64
5 10.14 91.84
6 9.84 91.97
7 9.51 92.10
8 9.31 92.16
9 9.14 92.23

The compression ratio for text is slightly better with bzip2. But the compression speed is much inferior to that of gzip.

lzma

Again the test is similar:

foreach n (`seq 1 9`)
    /usr/bin/time lzma -c -${n} foo >foo_${n}.lzma
end

The resulting data:

Level Speed [MB/s] Ratio [%]
1 46.66 90.92
2 30.81 91.18
3 16.84 91.31
4 14.89 91.31
5 8.04 91.97
6 4.39 92.62
7 4.12 92.68
8 3.93 92.68
9 3.85 92.68

Setting the compression level higher than 7 is not effective. The lower compression levels 1—3 are more effective than those for bzip2.

Conclusions

All these programs compress the test text by 86.7% to 92.7%. That is pretty impressive. The difference between worst and best compression does come at a huge computational cost. The best compression is approximately 55 times slower than the worst on the same hardware. On their default settings, lzma is around 22 times slower than gzip, while the compression is only 5% better.

All in all, I’d say don’t bother with lzma. Use gzip on its default compression, or bzip2 for better compression.


←  Storing and graphing time-based data