Roland's homepage

My random knot in the Web

ImageMagick: convert vs Wand

The ImageMagick suite has been in my software toolbox for years. It is my go-to tool for manipulating bitmap images. Over the years I have written several front-ends for specific tasks for in Python.

In general, I have used the subprocess module to launch convert or mogrify from Python.

With the release of Wand 0.5.0 which supports ImageMagick 7, I decided to try that by porting one of my scripts (foto4lb) to it.

What foto4lb basically does is take one or more directories of images and create a subdirectory that contains shrunken version of the images. It sets the modification time for those images to the time the photo was taken according to the EXIF metadata. It uses concurrent.futures to run multiple conversions in parallel.

Porting

Converting the subprocess call of convert to manipulations of an wand.image.Image instance is pretty straightforward with the aid of the Wand documentation. Basically, I replaced

args = [
    'convert', fname, '-strip', '-resize',
    str(newwidth), '-units', 'PixelsPerInch', '-density', '300', '-unsharp',
    '2x0.5+0.7+0', '-quality', '80', oname
]
rp = subprocess.call(args)

with

with Image(filename=fname) as img:
    scale = newwidth/img.width
    newheight = int(round(img.height * scale, 0))
    img.strip()
    img.resize(width=newwidth, height=newheight)
    img.units = 'pixelsperinch'
    img.resolution = (300, 300)
    img.unsharp_mask(radius=2, sigma=0.5, amount=0.7, threshold=0)
    img.quality = 80
    img.save(filename=oname)

The biggest differences are:

  1. Wand supports reading metadata like EXIF tags. In the other version of the program I used pillow for that.
  2. The Wand version uses a ProcessPoolExecutor, while the original uses a ThreadPoolExecutor combined with subprocess.call.

According to cloc, the version using Wand has 91 lines of code compared to 114 for the subprocess version. The difference is mostly due to the more involved handling of metadata and testing that convert is actually available.

Performance

For performance testing I used both programs on a directory with eight images. The time utility was used to log the run times.

The subprocess-based program yielded the following results:

3.81 real        13.03 user         1.25 sys
3.83 real        12.99 user         1.32 sys
3.78 real        13.06 user         1.15 sys
3.87 real        13.22 user         1.37 sys
3.85 real        13.09 user         1.32 sys
3.72 real        12.60 user         1.39 sys
3.73 real        12.77 user         1.23 sys
3.78 real        12.91 user         1.22 sys
3.84 real        13.21 user         1.19 sys
3.70 real        12.72 user         1.23 sys
----             -----              ----
3.79 mean        12.96 mean         1.27 mean

The times for the Wand-based version were:

7.29 real        17.87 user         2.51 sys
7.30 real        18.36 user         2.41 sys
7.19 real        18.08 user         2.23 sys
7.29 real        18.28 user         2.38 sys
7.29 real        18.19 user         2.38 sys
7.15 real        17.86 user         2.29 sys
7.24 real        18.30 user         2.20 sys
7.28 real        18.29 user         2.36 sys
7.23 real        18.10 user         2.40 sys
7.16 real        17.94 user         2.32 sys
----             -----              ----
7.24 mean        18.23 mean         2.35 mean

The performance of the Wand-based version is lower. Initially that surprised me given that both use the same shared library for image manipulation.

Next I used time.monotonic to measure how long the functions that do the actual processing of an image take. For the Wand version, this was around 2.1 seconds. For the version using convert it was around 1.8 seconds. So while there is some overhead from using Wand, it is not enough to explain the difference in real runtime.

At the moment I cannot explain why the program using Wand takes almost twice as long in real time. Using Pool.imap_unordered from the multiprocessing module instead of ProcessPoolExecutor.map did not really make a difference. So for now the it seems that a ThreadPoolExecutor combined with subprocess.call running convert is just more efficient.

Conclusion

For large batch jobs that are to be run in parallel, I will stick to using a ThreadPoolExecutor to run convert via subprocess.call since it is significantly faster.

For interactive use in (I)Python, the Wand module is superior. It presents a Pythonic interface to ImageMagick.


←  Using sqlite3 for time management