Should system backups be compressed?
Every now and then I make backups of my FreeBSD system’s filesystems with the
venerable dump
program. This is the only program that can capture
all features of the UFS filesystem. The filesystems to be backed up are:
- The root filesystem.
- The
/usr
filesystem. - The
/var
filesystem.
Note
My user data is kept in /home
, which is replicated by rsync
to several other disks. This is done because of the size of this
filesystem. It’s just unpractical to make complete dumps. It’s much
easier to just synchronize between two disks. I’m also not really
interested in retaining different versions, since I rarely throw
data away.
Because of original space constraints, I tend to compress the backups. For convenience, I only use compression programs that are available in the base system. That way I know for sure that I can restore a backup from a rescue disc. This leaves the following choices:
- gzip
- bzip2
- xz
The purpose of this article is to measure the time required to do (de)compression, and the compressed size. The test data is a recent backup I made:
> du -m *.dump
126 root-0-20130305.dump
10403 usr-0-20130305.dump
281 var-0-20130305.dump
Measurements
Gzip
First, I’m going to compress them with gzip
. Each compression is
done three times to check for variations:
> time gzip -k root-0-20130305.dump
10.960u 0.094s 0:11.06 99.9% 40+2722k 0+404io 0pf+0w
> rm root*.gz; time gzip -k root-0-20130305.dump
10.930u 0.070s 0:11.00 100.0% 40+2724k 0+404io 0pf+0w
> rm root*.gz; time gzip -k root-0-20130305.dump
10.922u 0.078s 0:11.00 99.9% 40+2723k 0+404io 0pf+0w
> time gzip -k var-0-20130305.dump
9.461u 0.393s 0:09.93 99.1% 40+2722k 2251+753io 0pf+0w
> rm var*.gz; time gzip -k var-0-20130305.dump
9.478u 0.125s 0:09.61 99.7% 40+2727k 0+753io 0pf+0w
> rm var*.gz ; time gzip -k var-0-20130305.dump
9.387u 0.126s 0:09.52 99.7% 40+2726k 0+753io 0pf+0w
> time gzip -k usr-0-20130305.dump
585.734u 15.911s 10:03.60 99.6% 40+2723k 83658+34195io 0pf+0w
> rm usr*.gz ; time gzip -k usr-0-20130305.dump
579.348u 14.895s 9:57.49 99.4% 40+2723k 83656+34195io 0pf+0w
> rm usr*.gz ; time gzip -k usr-0-20130305.dump
578.003u 14.802s 9:55.14 99.6% 40+2722k 83657+34196io 0pf+0w
The size of the data:
> du -m root*
126 root-0-20130305.dump
51 root-0-20130305.dump.gz
> du -m var*
281 var-0-20130305.dump
95 var-0-20130305.dump.gz
> du -m usr*
10403 /tmp/usr-0-20130305.dump
4276 usr-0-20130305.dump.gz
The /var
filesystem is the outlier in that it has both the
smallest compressed size and the fastest compression time. This
filesystem is mainly filled with small text files, while the other
filesystems are more mixed between text and binary files. We will
therefore disregard the values for this filesystem.
The data is compressed to between 40.4-41.1% of its original size. The compression speed is between 11.5-17.7 MB/s. There is not much difference in time between the runs, so we’ll skip the multiple runs from now on.
Bzip2
The bzip2
compressor is next. This should compress better but
slower. Note that we are using the standard bzip2
here, not the
parallel version from ports.
> time bzip2 -k root-0-20130305.dump
17.285u 0.173s 0:17.53 99.5% 35+2726k 1010+348io 3pf+0w
> time bzip2 -k var-0-20130305.dump
53.122u 0.338s 0:53.54 99.8% 35+2725k 2251+705io 0pf+0w
> time bzip2 -k usr-0-20130305.dump
1830.357u 15.787s 30:51.01 99.7% 35+2727k 83656+31339io 1pf+0w
The size of the data:
> du -m /tmp/*.bz2
44 /tmp/root-0-20130305.dump.bz2
3919 /tmp/usr-0-20130305.dump.bz2
89 /tmp/var-0-20130305.dump.bz2
The data is compressed to between 32% and 38% of its original
size. But the compression speed is only between 5.7 MB/s and 7.3 MB/s.
The compression time has significantly increased compared to gzip
(with a factor of 3.1 for usr
), for a modest compression gain.
Xz
This is the newest compression program to have been added to the FreeBSD base system. Earlier, I also did a comparison with bzip2.
> time xz -k root-0-20130305.dump
90.279u 0.259s 1:30.69 99.8% 71+2691k 1013+178io 5pf+0w
> time xz -k var-0-20130305.dump
178.502u 0.614s 2:59.22 99.9% 71+2691k 0+672io 0pf+0w
> time xz -k usr-0-20130305.dump
6175.208u 29.190s 1:43:32.49 99.8% 71+2692k 83652+26706io 1pf+0w
The size of the data:
> du -m /tmp/*.xz
23 /tmp/root-0-20130305.dump.xz
3340 /tmp/usr-0-20130305.dump.xz
85 /tmp/var-0-20130305.dump.xz
The compression rate is the best of all. The root dump is compressed
to 18% of its original size. The /var
dump was little better than
when using bzip2
. The compression is slow, not more than 1.4 MB/s
to 1.7 MB/s.
Conclusion
The dump
program generates around 6−10 MB/s of data. Only gzip
and bzip2
can keep up with this when the output of the dump is
piped through them. Using xz
would slow down this process considerably.
Another consideration is that the restore operation becomes more
complicated. Instead of just using the restore
program to open the
dump file, the output of the decompression program should be piped
into the restore program. This complicates matters and makes them slower.
As opposed to earlier when I made backups to DVD, I now use USB connected disk drives. This has done away with the space constraints.
So my conclusion is that compressing backups is not worth the extra complexity for me anymore.
Making the dump of /usr
smaller would make the whole backup
process faster. Since the function of a backup is to restore a working
system, I have decided to exclude some directories in /usr
from
the dump. Looking at the contents of this filesystem, we see
> du -cm -d 1 /usr/
1 /usr/.snap
55 /usr/bin
1 /usr/games
21 /usr/include
38 /usr/lib
1 /usr/libdata
14 /usr/libexec
5549 /usr/local
2338 /usr/obj
2206 /usr/ports
28 /usr/sbin
52 /usr/share
1539 /usr/src
11867 /usr/
11867 total
Looking at this list, I decided to set the nodump
flag on
/usr/obj
and /usr/ports
, because both and are not necessary
for getting a system up and running. Using portsnap
one can easily
re-populate the ports tree. And /usr/obj
is only used for OS rebuilds.
I’m explicitly not excluding /usr/local
, because that basically
contains all installed ports, which is vey convenient. I’m still on
the fence about exclusing /usr/src
. On the one hand it is easy to
download via subversion, on the other hand I like to keep the source
that built the system on hand. So it stays in for now.
For comments, please send me an e-mail.
Related articles
- Testing compression speed and ratio
- Evaluating Zstandard compression
- XZ compression
- genbackup
- Automated local backups