On Python speed
As an engineer, I write a lot of small python programs as tools for specific tasks. Generally, these are not large programs. Most of them are below 100 lines of code (“LOC”, as measured by cloc), although there are a few in the 300−400 LOC range.
In this article, I will present some observations about these, and draw some conclusion from them.
The reason I write these tools is because they extend my capabilities. For example, I wrote a set of scripts that process several thousand pages of material safety data sheets and legislative documents to evaluate a list of chemical mixtures for dangers and legal restrictions. This has to be done twice a year, in lockstep with the changes in legislation. The run time of this collection of scripts is about 7 seconds. Doing the same by hand doesn’t bear thinking about. Not only would it take multiple days each time, it would be tedious and error-prone for a human. This collection of scripts grew over time. In total I estimate it took me a couple of days to write them. This is much less then the time it would take me to do this by hand.
Observation 1: For most of my programs, the time needed to write the program far exceeds the run time. But even so, the potential savings still make it worth doing.
Conclusion 1: In this situation, programmer productivity is key. With its built-in data-structures, dynamic typing and automatic memory management, Python is a relatively easy language to program in. It also has a broad standard library and lots of available modules.
Observation 2: Many of these tools are written for a specific purpose and are only used a couple of times.
Conclusion 2: Because of the disparity between writing time and run time, such tools are generally not worth optimizing.
The run time of my programs is generally measured in seconds rather than minutes. Although I have to admit that the programs where I thought it worth it have been profiled and optimized.
In general I tend to save my optimization efforts for programs that are used often. Especially those that have to process large amounts of data.
Observation 3: It is often possible to significantly increase the speed of Python programs. See the ones I’ve profiled.
Conclusion 3: If your program is too slow, chances are you’re doing
something wrong.
.. E.g. if you have to do repeated look-ups often? Try @functools.cache
.
In my Python port of John Walker’s ent
program, the variant using numpy
is
around ten times faster than the one without it.
Observation 4: Extensions written in C/C++ or FORTRAN offer significant speed
increases over “pure” Python.
Around half of all the Python scripts I’ve written uses numpy
.
Conclusion 4: Look for a suitable extension for your problem. Need to
crunch numbers? Use numpy
.
Overall conclusion
Python is not a systems programming language like C, or a mathematics powerhouse like FORTRAN. And it doesn’t pretend to be.
As the FAQ states:
Python is a high-level general-purpose programming language that can be applied to many different classes of problems.
In my work, Python’s speed has turned out to be mostly irrelevant. It is fast enough for the things I use it for. The same goes for a lot of the general purpose programming that it is used for, I suspect.
And for more performance-hungry applications, extensions like numpy
can
help a lot.
That is not to say I don’t appreciate the developers’ efforts to make it faster. That work only extends the range of problems that Python can be used for.
For comments, please send me an e-mail.
Related articles
- Profiling Python scripts(6): auto-orient
- Profiling with pyinstrument
- From python script to executable with cython
- Python 3.11 speed comparison with 3.9
- Getting the first or last item from a dictview without a list