Profiling Python scripts (2): stlinfo
This is the second in a series of articles that covers analyzing and improving
performance bottlenecks in Python scripts.
In this second article the performance of stlinfo
is looked at.
Note
The other articles in this series will be listed under “Related Articles” at the bottom of the page.
Profiling stlinfo
The script is profiled as follows to get a baseline for its performance:
python -m cProfile \
-s tottime stlinfo.py \
../testdata-stltools/headrest_mesh.stl| less
The relevant output for a total time >0.01 s is:
1678973 function calls (1678577 primitive calls) in 0.836 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 832945 0.307 0.000 0.466 0.000 stl.py:82(_getbp) 1 0.096 0.096 0.562 0.562 stl.py:78(<listcomp>) 277649 0.094 0.000 0.094 0.000 {built-in method _struct.unpack} 1 0.052 0.052 0.836 0.836 stlinfo.py:9(<module>) 184 0.048 0.000 0.048 0.000 {built-in method builtins.min} 65 0.047 0.001 0.047 0.001 {built-in method builtins.max} 277650 0.041 0.000 0.041 0.000 {method 'read' of 'mmap.mmap' objects} 1 0.033 0.033 0.033 0.033 bbox.py:25(<listcomp>) 1 0.032 0.032 0.032 0.032 bbox.py:22(<listcomp>) 1 0.032 0.032 0.032 0.032 bbox.py:21(<listcomp>) 278476/278398 0.023 0.000 0.023 0.000 {built-in method builtins.len} 1 0.016 0.016 0.773 0.773 stlinfo.py:23(main)
It is clear that the function _getbp
from stltools/stl.py
dominates
the time spent.
The next two items are connected to it;
_getbp
is called instl.py:78(<listcomp>)
_struct.unpack
is a major part of_getbp
as shown below.
Investigating _getbp
The function in question looks like this.
def _getbp(m):
"""
Generate points from a binary STL file.
Arguments:
m: A memory mapped file.
Yields:
The vertices as 3-tuple of floats.
"""
while True:
v = m.read(50)
if len(v) != 50:
break
p = struct.unpack("<12x9f2x", v)
yield tuple(p[0:3])
yield tuple(p[3:6])
yield tuple(p[6:])
Most of the actual work is being done by struct.unpack
.
Looking at the source code for struct.py
, this is already implemented in
C.
So trying to improve on that by writing a custom converter is not the first
point of concern.
Instead of reading in small pieces from the memory mapped file, let’s read it
all and use iter_unpack
.
def _getbp(m):
"""
Generate points from a binary STL file.
Arguments:
m: A memory mapped file.
Yields:
The vertices as 3-tuple of floats.
"""
for p in struct.iter_unpack("<12x9f2x", m.read()):
yield tuple(p[0:3])
yield tuple(p[3:6])
yield tuple(p[6:])
Profiling again:
846029 function calls (845633 primitive calls) in 0.629 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 832945 0.257 0.000 0.263 0.000 stl.py:82(_getbp) 1 0.092 0.092 0.355 0.355 stl.py:78(<listcomp>) 1 0.052 0.052 0.629 0.629 stlinfo.py:9(<module>) 184 0.048 0.000 0.048 0.000 {built-in method builtins.min} 65 0.047 0.001 0.047 0.001 {built-in method builtins.max} 1 0.033 0.033 0.033 0.033 bbox.py:22(<listcomp>) 1 0.032 0.032 0.032 0.032 bbox.py:25(<listcomp>) 1 0.032 0.032 0.032 0.032 bbox.py:21(<listcomp>) 1 0.016 0.016 0.566 0.566 stlinfo.py:23(main)
This has shaved about 25% off the total run time, and 16% off the run time of
_getbp
.
That is a quick win.
This did cause an error when the size of the buffer was not a multiple of the struct size. To prevent that the code was changed to:
def _getbp(m):
"""
Generate points from a binary STL file.
Arguments:
m: A memory mapped file.
Yields:
The vertices as 3-tuple of floats.
"""
fmt = "<12x9f2x"
sz = struct.calcsize(fmt)
buffer = m.read()
count = len(buffer) // sz * sz
buffer = buffer[:count]
for p in struct.iter_unpack(fmt, buffer):
yield tuple(p[0:3])
yield tuple(p[3:6])
yield tuple(p[6:])
This did not have a significant effect on performance.
Lessons learned
- If possible, load data in memory all at once rather than in small chunks.
- Tests are worthwhile to validate changes.
For comments, please send me an e-mail.
Related articles
- Profiling Python scripts (4): vecops.indexate
- Profiling Python scripts (3): stl2ps
- Profiling Python scripts (1): stl2pov
- Profiling Python scripts(6): auto-orient
- Profiling with pyinstrument