The ImageMagick suite has been in my software toolbox for years. It is my go-to tool for manipulating bitmap images. Over the years I have written several front-ends for specific tasks for in Python.
In general, I have used the subprocess module to launch convert or mogrify from Python.
What foto4lb basically does is take one or more directories of images and create a subdirectory that contains shrunken version of the images. It sets the modification time for those images to the time the photo was taken according to the EXIF metadata. It uses concurrent.futures to run multiple conversions in parallel.
Converting the subprocess call of convert to manipulations of an wand.image.Image instance is pretty straightforward with the aid of the Wand documentation. Basically, I replaced
args = [ 'convert', fname, '-strip', '-resize', str(newwidth), '-units', 'PixelsPerInch', '-density', '300', '-unsharp', '2x0.5+0.7+0', '-quality', '80', oname ] rp = subprocess.call(args)
with Image(filename=fname) as img: scale = newwidth/img.width newheight = int(round(img.height * scale, 0)) img.strip() img.resize(width=newwidth, height=newheight) img.units = 'pixelsperinch' img.resolution = (300, 300) img.unsharp_mask(radius=2, sigma=0.5, amount=0.7, threshold=0) img.quality = 80 img.save(filename=oname)
The biggest differences are:
- Wand supports reading metadata like EXIF tags. In the other version of the program I used pillow for that.
- The Wand version uses a ProcessPoolExecutor, while the original uses a ThreadPoolExecutor combined with subprocess.call.
According to cloc, the version using Wand has 91 lines of code compared to 114 for the subprocess version. The difference is mostly due to the more involved handling of metadata and testing that convert is actually available.
For performance testing I used both programs on a directory with eight images. The time utility was used to log the run times.
The subprocess-based program yielded the following results:
3.81 real 13.03 user 1.25 sys 3.83 real 12.99 user 1.32 sys 3.78 real 13.06 user 1.15 sys 3.87 real 13.22 user 1.37 sys 3.85 real 13.09 user 1.32 sys 3.72 real 12.60 user 1.39 sys 3.73 real 12.77 user 1.23 sys 3.78 real 12.91 user 1.22 sys 3.84 real 13.21 user 1.19 sys 3.70 real 12.72 user 1.23 sys ---- ----- ---- 3.79 mean 12.96 mean 1.27 mean
The times for the Wand-based version were:
7.29 real 17.87 user 2.51 sys 7.30 real 18.36 user 2.41 sys 7.19 real 18.08 user 2.23 sys 7.29 real 18.28 user 2.38 sys 7.29 real 18.19 user 2.38 sys 7.15 real 17.86 user 2.29 sys 7.24 real 18.30 user 2.20 sys 7.28 real 18.29 user 2.36 sys 7.23 real 18.10 user 2.40 sys 7.16 real 17.94 user 2.32 sys ---- ----- ---- 7.24 mean 18.23 mean 2.35 mean
The performance of the Wand-based version is lower. Initially that surprised me given that both use the same shared library for image manipulation.
Next I used time.monotonic to measure how long the functions that do the actual processing of an image take. For the Wand version, this was around 2.1 seconds. For the version using convert it was around 1.8 seconds. So while there is some overhead from using Wand, it is not enough to explain the difference in real runtime.
At the moment I cannot explain why the program using Wand takes almost twice as long in real time. Using Pool.imap_unordered from the multiprocessing module instead of ProcessPoolExecutor.map did not really make a difference. So for now the it seems that a ThreadPoolExecutor combined with subprocess.call running convert is just more efficient.
For large batch jobs that are to be run in parallel, I will stick to using a ThreadPoolExecutor to run convert via subprocess.call since it is significantly faster.
For interactive use in (I)Python, the Wand module is superior. It presents a Pythonic interface to ImageMagick.