Comparing stock ffmpeg with optimized ffmpeg
Recently FreeBSD changed the multimedia/ffmpeg
port to drop the
-ffast-math
and -fno-finite-math-only
from the CFLAGS when building an
optimized binary. The following experiment was conducted to see how much of
a difference this makes.
First we look at ffmpeg
binary compiled with the standard options.
>du /usr/local/bin/ffmpeg
216 /usr/local/bin/ffmpeg
> ll /usr/local/bin/ffmpeg|awk '{print $6}'
220584
> sha1 /usr/local/bin/ffmpeg
SHA1 (/usr/local/bin/ffmpeg) = c8ea48c5b25d183775cf4c109ade87cad8e342f6
The dvd2webm script internally calls ffmpeg
to do a two-pass convertion to
WEBM format with VP9 video and vorbis audio. It uses the following commands
for the two passes.
ffmpeg -loglevel quiet -i impulse.mpg -passlogfile impulse \
-c:v libvpx-vp9 -threads 3 -pass 1 -sn -b:v 1400k -crf 33 -g 250 \
-speed 4 -tile-columns 1 -an -f webm -map 0:v -map 0:a:0 -y /dev/null
ffmpeg -loglevel quiet -i impulse.mpg -passlogfile impulse \
-c:v libvpx-vp9 -threads 3 -pass 2 -sn -b:v 1400k -crf 33 -g 250 \
-speed 2 -tile-columns 1 -auto-alt-ref 1 -lag-in-frames 25 \
-c:a libvorbis -q:a 3 -f webm -map 0:v -map 0:a:0 -y impulse.webm
The output of the run and the result is:
> du -m impulse.mpg
1801 impulse.mpg
> dvd2webm impulse.mpg
INFO: processing 'impulse.mpg'.
INFO: started at 2017-01-03 21:43:07.
INFO: looking for cropping.
INFO: using tile-columns flag set to 1.
INFO: using 3 threads.
INFO: using cropping 704:560:10:6.
INFO: running pass 1...
INFO: pass 1 took 0:32:18.
INFO: running pass 2...
INFO: pass 2 took 2:27:36.
INFO: the size of 'impulse.webm' is 17% of the size of 'impulse.mpg'.
> du -m impulse*
16 impulse-0.log
1801 impulse.mpg
313 impulse.webm
> rm impulse-0.log
> mv impulse.webm impulse-noopt.webm
This is the baseline run, lasting 10817 seconds total. This is of course but a single run, although with the computer otherwise mostly idle. One would expect this to vary by at least several seconds over multiple runs.
After this test, the following was added to /etc/make.conf
.
.if ${.CURDIR:M*/multimedia/ffmpeg}
CFLAGS += -ffast-math -fno-finite-math-only
.endif
The ffmpeg port was then re-built and re-installed.
> du /usr/local/bin/ffmpeg
216 /usr/local/bin/ffmpeg
> ll /usr/local/bin/ffmpeg | awk '{print $6}'
220472
> sha1 /usr/local/bin/ffmpeg
SHA1 (/usr/local/bin/ffmpeg) = fad1bde32cc335571532477d1f81182f5334ad2c
The test was then run again.
> dvd2webm impulse.mpg
INFO: processing 'impulse.mpg'.
INFO: started at 2017-01-04 01:49:16.
INFO: looking for cropping.
INFO: using tile-columns flag set to 1.
INFO: using 3 threads.
INFO: using cropping 704:560:10:6.
INFO: running pass 1...
INFO: pass 1 took 0:32:51.
INFO: running pass 2...
INFO: pass 2 took 2:27:35.
INFO: the size of 'impulse.webm' is 17% of the size of 'impulse.mpg'.
> du -m impulse*
16 impulse-0.log
313 impulse-noopt.webm
1801 impulse.mpg
313 impulse.webm
This run took 10826 seconds, 0.3% more time than the baseline run. That is
not a significant difference in the runtime, in my opinion. So there doesn’t
seem to be much influence from -ffast-math
and -fno-finite-math-only
in this test. It could be that this is because I’m not using a complicated
filter graph in this test, just a crop operation. The format conversion is
presumably handled by the libvpx
and libvorbis
libraries.
As a further test, I disabled the OPTIMIZED_CFLAGS
, and re-built the program.
> du /usr/local/bin/ffmpeg
212 /usr/local/bin/ffmpeg
> ll /usr/local/bin/ffmpeg | awk '{print $6}'
215848
> sha1 /usr/local/bin/ffmpeg
SHA1 (/usr/local/bin/ffmpeg) = 8a83e72c0890195a52f5bc919c00c86ce4b60946
The test was then run for a third time.
> dvd2webm impulse.mpg
INFO: processing 'impulse.mpg'.
INFO: started at 2017-01-07 10:10:54.
INFO: looking for cropping.
INFO: using tile-columns flag set to 1.
INFO: using 3 threads.
INFO: using cropping 704:560:10:6.
INFO: running pass 1...
INFO: pass 1 took 0:32:27.
INFO: running pass 2...
INFO: running pass 2...
INFO: pass 2 took 2:27:50.
INFO: the size of 'impulse.webm' is 17% of the size of 'impulse.mpg'.
The run took 10817 seconds, 0.21% longer than the baseline time, weirdly
enough but shorter than the build with -ffast-math
. This is probably due
to variations between runs being larger than variations between compilations.
Again this is not significantly different. This reinforces the conclusion that
for operation without large filter graphs, optimizations in ffmpeg
itself
don’t really matter.
For comments, please send me an e-mail.