Comparing stock ffmpeg with optimized ffmpeg

Recently FreeBSD changed the multimedia/ffmpeg port to drop the -ffast-math and -fno-finite-math-only from the CFLAGS when building an optimized binary. The following experiment was conducted to see how much of a difference this makes.

First we look at ffmpeg binary compiled with the standard options.

>du /usr/local/bin/ffmpeg
216 /usr/local/bin/ffmpeg
> ll /usr/local/bin/ffmpeg|awk '{print $6}'
220584
> sha1 /usr/local/bin/ffmpeg
SHA1 (/usr/local/bin/ffmpeg) = c8ea48c5b25d183775cf4c109ade87cad8e342f6

The dvd2webm script internally calls ffmpeg to do a two-pass convertion to WEBM format with VP9 video and vorbis audio. It uses the following commands for the two passes.

ffmpeg -loglevel quiet -i impulse.mpg -passlogfile impulse \
-c:v libvpx-vp9 -threads 3 -pass 1 -sn -b:v 1400k -crf 33 -g 250 \
-speed 4 -tile-columns 1 -an -f webm -map 0:v -map 0:a:0 -y /dev/null

ffmpeg -loglevel quiet -i impulse.mpg -passlogfile impulse \
-c:v libvpx-vp9 -threads 3 -pass 2 -sn -b:v 1400k -crf 33 -g 250 \
-speed 2 -tile-columns 1 -auto-alt-ref 1 -lag-in-frames 25 \
-c:a libvorbis -q:a 3 -f webm -map 0:v -map 0:a:0 -y impulse.webm

The output of the run and the result is:

> du -m impulse.mpg
1801    impulse.mpg
> dvd2webm impulse.mpg
INFO: processing 'impulse.mpg'.
INFO: started at 2017-01-03 21:43:07.
INFO: looking for cropping.
INFO: using tile-columns flag set to 1.
INFO: using 3 threads.
INFO: using cropping 704:560:10:6.
INFO: running pass 1...
INFO: pass 1 took 0:32:18.
INFO: running pass 2...
INFO: pass 2 took 2:27:36.
INFO: the size of 'impulse.webm' is 17% of the size of 'impulse.mpg'.
> du -m impulse*
16      impulse-0.log
1801    impulse.mpg
313     impulse.webm
> rm impulse-0.log
> mv impulse.webm impulse-noopt.webm

This is the baseline run, lasting 10817 seconds total. This is of course but a single run, although with the computer otherwise mostly idle. One would expect this to vary by at least several seconds over multiple runs.

After this test, the following was added to /etc/make.conf.

.if ${.CURDIR:M*/multimedia/ffmpeg}
CFLAGS += -ffast-math -fno-finite-math-only
.endif

The ffmpeg port was then re-built and re-installed.

> du /usr/local/bin/ffmpeg
216 /usr/local/bin/ffmpeg
> ll /usr/local/bin/ffmpeg | awk '{print $6}'
220472
> sha1 /usr/local/bin/ffmpeg
SHA1 (/usr/local/bin/ffmpeg) = fad1bde32cc335571532477d1f81182f5334ad2c

The test was then run again.

> dvd2webm impulse.mpg
INFO: processing 'impulse.mpg'.
INFO: started at 2017-01-04 01:49:16.
INFO: looking for cropping.
INFO: using tile-columns flag set to 1.
INFO: using 3 threads.
INFO: using cropping 704:560:10:6.
INFO: running pass 1...
INFO: pass 1 took 0:32:51.
INFO: running pass 2...
INFO: pass 2 took 2:27:35.
INFO: the size of 'impulse.webm' is 17% of the size of 'impulse.mpg'.
> du -m impulse*
16      impulse-0.log
313     impulse-noopt.webm
1801    impulse.mpg
313     impulse.webm

This run took 10826 seconds, 0.3% more time than the baseline run. That is not a significant difference in the runtime, in my opinion. So there doesn’t seem to be much influence from -ffast-math and -fno-finite-math-only in this test. It could be that this is because I’m not using a complicated filter graph in this test, just a crop operation. The format conversion is presumably handled by the libvpx and libvorbis libraries.

As a further test, I disabled the OPTIMIZED_CFLAGS, and re-built the program.

> du /usr/local/bin/ffmpeg
212 /usr/local/bin/ffmpeg
> ll /usr/local/bin/ffmpeg | awk '{print $6}'
215848
> sha1 /usr/local/bin/ffmpeg
SHA1 (/usr/local/bin/ffmpeg) = 8a83e72c0890195a52f5bc919c00c86ce4b60946

The test was then run for a third time.

> dvd2webm impulse.mpg
INFO: processing 'impulse.mpg'.
INFO: started at 2017-01-07 10:10:54.
INFO: looking for cropping.
INFO: using tile-columns flag set to 1.
INFO: using 3 threads.
INFO: using cropping 704:560:10:6.
INFO: running pass 1...
INFO: pass 1 took 0:32:27.
INFO: running pass 2...
INFO: running pass 2...
INFO: pass 2 took 2:27:50.
INFO: the size of 'impulse.webm' is 17% of the size of 'impulse.mpg'.

The run took 10817 seconds, 0.21% longer than the baseline time, weirdly enough but shorter than the build with -ffast-math. This is probably due to variations between runs being larger than variations between compilations.

Again this is not significantly different. This reinforces the conclusion that for operation without large filter graphs, optimizations in ffmpeg itself don’t really matter.

For comments, please send me an e-mail.

← Chrome versus Firefox Evaluating Zstandard compression →

Roland's homepage

Comparing stock ffmpeg with optimized ffmpeg

Related articles