Roland's homepage

My random knot in the Web

Should system backups be compressed?

Every now and then I make backups of my FreeBSD system’s filesystems with the venerable dump program. This is the only program that can capture all features of the UFS filesystem. The filesystems to be backed up are:

  • The root filesystem.
  • The /usr filesystem.
  • The /var filesystem.

Note

My user data is kept in /home, which is replicated by rsync to several other disks. This is done because of the size of this filesystem. It’s just unpractical to make complete dumps. It’s much easier to just synchronize between two disks. I’m also not really interested in retaining different versions, since I rarely throw data away.

Because of original space constraints, I tend to compress the backups. For convenience, I only use compression programs that are available in the base system. That way I know for sure that I can restore a backup from a rescue disc. This leaves the following choices:

  • gzip
  • bzip2
  • xz

The purpose of this article is to measure the time required to do (de)compression, and the compressed size. The test data is a recent backup I made:

> du -m *.dump
126     root-0-20130305.dump
10403   usr-0-20130305.dump
281     var-0-20130305.dump

Measurements

Gzip

First, I’m going to compress them with gzip. Each compression is done three times to check for variations:

> time gzip -k root-0-20130305.dump
10.960u 0.094s 0:11.06 99.9%    40+2722k 0+404io 0pf+0w
> rm root*.gz; time gzip -k root-0-20130305.dump
10.930u 0.070s 0:11.00 100.0%   40+2724k 0+404io 0pf+0w
> rm root*.gz; time gzip -k root-0-20130305.dump
10.922u 0.078s 0:11.00 99.9%    40+2723k 0+404io 0pf+0w
> time gzip -k var-0-20130305.dump
9.461u 0.393s 0:09.93 99.1%     40+2722k 2251+753io 0pf+0w
> rm var*.gz; time gzip -k var-0-20130305.dump
9.478u 0.125s 0:09.61 99.7%     40+2727k 0+753io 0pf+0w
> rm var*.gz ; time gzip -k var-0-20130305.dump
9.387u 0.126s 0:09.52 99.7%     40+2726k 0+753io 0pf+0w
> time gzip -k usr-0-20130305.dump
585.734u 15.911s 10:03.60 99.6% 40+2723k 83658+34195io 0pf+0w
> rm usr*.gz ; time gzip -k usr-0-20130305.dump
579.348u 14.895s 9:57.49 99.4%  40+2723k 83656+34195io 0pf+0w
> rm usr*.gz ; time gzip -k usr-0-20130305.dump
578.003u 14.802s 9:55.14 99.6%  40+2722k 83657+34196io 0pf+0w

The size of the data:

> du -m root*
126     root-0-20130305.dump
51      root-0-20130305.dump.gz
> du -m var*
281     var-0-20130305.dump
95      var-0-20130305.dump.gz
> du -m usr*
10403   /tmp/usr-0-20130305.dump
4276    usr-0-20130305.dump.gz

The /var filesystem is the outlier in that it has both the smallest compressed size and the fastest compression time. This filesystem is mainly filled with small text files, while the other filesystems are more mixed between text and binary files. We will therefore disregard the values for this filesystem.

The data is compressed to between 40.4-41.1% of its original size. The compression speed is between 11.5-17.7 MB/s. There is not much difference in time between the runs, so we’ll skip the multiple runs from now on.

Bzip2

The bzip2 compressor is next. This should compress better but slower. Note that we are using the standard bzip2 here, not the parallel version from ports.

> time bzip2 -k root-0-20130305.dump
17.285u 0.173s 0:17.53 99.5%    35+2726k 1010+348io 3pf+0w
> time bzip2 -k var-0-20130305.dump
53.122u 0.338s 0:53.54 99.8%    35+2725k 2251+705io 0pf+0w
> time bzip2 -k usr-0-20130305.dump
1830.357u 15.787s 30:51.01 99.7%        35+2727k 83656+31339io 1pf+0w

The size of the data:

> du -m /tmp/*.bz2
44   /tmp/root-0-20130305.dump.bz2
3919 /tmp/usr-0-20130305.dump.bz2
89   /tmp/var-0-20130305.dump.bz2

The data is compressed to between 32% and 38% of its original size. But the compression speed is only between 5.7 MB/s and 7.3 MB/s. The compression time has significantly increased compared to gzip (with a factor of 3.1 for usr), for a modest compression gain.

Xz

This is the newest compression program to have been added to the FreeBSD base system. Earlier, I also did a comparison with bzip2.

> time xz -k root-0-20130305.dump
90.279u 0.259s 1:30.69 99.8%    71+2691k 1013+178io 5pf+0w
> time xz -k var-0-20130305.dump
178.502u 0.614s 2:59.22 99.9%   71+2691k 0+672io 0pf+0w
> time xz -k usr-0-20130305.dump
6175.208u 29.190s 1:43:32.49 99.8%      71+2692k 83652+26706io 1pf+0w

The size of the data:

> du -m /tmp/*.xz
23   /tmp/root-0-20130305.dump.xz
3340 /tmp/usr-0-20130305.dump.xz
85   /tmp/var-0-20130305.dump.xz

The compression rate is the best of all. The root dump is compressed to 18% of its original size. The /var dump was little better than when using bzip2. The compression is slow, not more than 1.4 MB/s to 1.7 MB/s.

Conclusion

The dump program generates around 6-10 MB/s of data. Only gzip and bzip2 can keep up with this when the output of the dump is piped through them. Using xz would slow down this process considerably.

Another consideration is that the restore operation becomes more complicated. Instead of just using the restore program to open the dump file, the output of the decompression program should be piped into the restore program. This complicates matters and makes them slower.

As opposed to earlier when I made backups to DVD, I now use USB connected disk drives. This has done away with the space constraints.

So my conclusion is that compressing backups is not worth the extra complexity for me anymore.

Making the dump of /usr smaller would make the whole backup process faster. Since the function of a backup is to restore a working system, I have decided to exclude some directories in /usr from the dump. Looking at the contents of this filesystem, we see

> du -cm -d 1 /usr/
1       /usr/.snap
55      /usr/bin
1       /usr/games
21      /usr/include
38      /usr/lib
1       /usr/libdata
14      /usr/libexec
5549    /usr/local
2338    /usr/obj
2206    /usr/ports
28      /usr/sbin
52      /usr/share
1539    /usr/src
11867   /usr/
11867   total

Looking at this list, I decided to set the nodump flag on /usr/obj and /usr/ports, because both and are not necessary for getting a system up and running. Using portsnap one can easily re-populate the ports tree. And /usr/obj is only used for OS rebuilds. I’m explicitly not excluding /usr/local, because that basically contains all installed ports, which is vey convenient. I’m still on the fence about exclusing /usr/src. On the one hand it is easy to download via subversion, on the other hand I like to keep the source that built the system on hand. So it stays in for now.


←  Updating to FreeBSD 9.1-RELEASE-p1 Updating multiple systems using a pre-built FreeBSD  →