Profiling Python scripts (3): stl2ps
This is the third in a series of articles that covers analyzing and improving
performance bottlenecks in Python scripts.
The performance of stl2ps
is the topic of this article.
Note
The other articles in this series will be listed under “Related Articles” at the bottom of the page.
Profiling stl2ps
Establishing the baseline. As usual, anything with a total time of <0.1 s is not shown:
> python -m cProfile -s tottime stl2ps.py -x 200 ../testdata-stltools/headrest_mesh.stl | less 66300521 function calls (66300099 primitive calls) in 23.493 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 8051792 5.931 0.000 10.664 0.000 {built-in method builtins.sum} 3 5.566 1.855 18.026 6.009 vecops.py:152(xform) 38870720 4.590 0.000 4.590 0.000 vecops.py:171(<genexpr>) 277648 0.721 0.000 2.073 0.000 vecops.py:49(normal) 9718697 0.702 0.000 0.702 0.000 {method 'append' of 'list' objects} 1943539 0.683 0.000 0.683 0.000 vecops.py:134(<genexpr>) 253797 0.651 0.000 0.651 0.000 {method 'format' of 'str' objects} 3 0.569 0.190 0.569 0.190 vecops.py:149(<listcomp>) 1 0.452 0.452 23.299 23.299 stl2ps.py:27(main) 555298 0.420 0.000 0.420 0.000 utils.py:85(take) 277648 0.294 0.000 0.880 0.000 vecops.py:24(normalize) 832945 0.257 0.000 0.263 0.000 stl.py:82(_getbp) 1 0.239 0.239 0.239 0.239 stl2ps.py:150(<listcomp>) 277648 0.196 0.000 0.196 0.000 vecops.py:38(cross) 253859 0.192 0.000 0.192 0.000 {built-in method builtins.max} 1 0.185 0.185 23.493 23.493 stl2ps.py:9(<module>) 3 0.166 0.055 0.850 0.283 vecops.py:124(to4) 1 0.160 0.160 0.811 0.811 stl2ps.py:210(<listcomp>) 1110592 0.143 0.000 0.143 0.000 vecops.py:21(<genexpr>) 221 0.142 0.001 0.142 0.001 {built-in method builtins.min} 1110592 0.142 0.000 0.142 0.000 vecops.py:59(<genexpr>) 1110592 0.137 0.000 0.137 0.000 vecops.py:35(<genexpr>) 1110592 0.133 0.000 0.133 0.000 vecops.py:60(<genexpr>) 277648 0.125 0.000 0.450 0.000 vecops.py:11(length) 1 0.104 0.104 2.176 2.176 stl2ps.py:129(<listcomp>)
The sum
method is only used in two places, both in vecops.py
.
One use is in the length
function. But since this isn’t used much, that is
not the main issue. It is also used in xform
, which is called a lot.
So the three top total times basically all belong to the xform
function.
Unfortunately, there is nothing that we can do to make the built-in method sum
faster.
The xform
function is simply using a 4×4 matrix to transform vectors.
At the moment I don’t see an easy way to make this intrinsically faster.
Note
Using multiprocessing.Pool
to spread the work over multiple cores would
be a possibility. But that will not be investigated at this moment.
But that doesn’t mean nothing can be done.
In stl2ps.py
, the vertices are transformed twice, and the facet normals
once.
In the vertices of an STL file, every unique point usually appears at least
three times for a closed object.
So determining the unique points and only transforming those could be
a significant speedup.
Luckily, the basis for that (vecops.indexate
) already exists.
The following code changes are needed to implement this.
diff --git a/stl2ps.py b/stl2ps.py
index 669e8d1..860ff80 100644
--- a/stl2ps.py
+++ b/stl2ps.py
@@ -5,7 +5,7 @@
# Copyright © 2011-2020 R.F. Smith <rsmith@xs4all.nl>.
# SPDX-License-Identifier: MIT
# Created: 2011-04-11T01:41:59+02:00
-# Last modified: 2020-10-04T17:42:36+0200
+# Last modified: 2022-01-19T22:05:40+0100
"""
Program for converting a view of an STL file into a PostScript file.
@@ -122,16 +122,19 @@ def main(argv):
except ValueError as e:
logging.error("{}: {}".format(args.file, e))
sys.exit(1)
- origbb = bbox.makebb(vertices)
+ indices, uvertices = vecops.indexate(vertices)
+ del vertices
+ origbb = bbox.makebb(uvertices)
logging.info("calculating normal vectors")
- facets = list(utils.chunked(vertices, 3))
- # facets = list(utils.chunked(utils.chunked(vertices, 3), 3))
- normals = [vecops.normal(a, b, c) for a, b, c in facets]
+ ifacets = list(utils.chunked(indices, 3))
+ normals = [
+ vecops.normal(uvertices[a], uvertices[b], uvertices[c]) for a, b, c in ifacets
+ ]
logging.info("applying transformations to world coordinates")
- vertices = vecops.xform(tr, vertices)
+ uvertices = vecops.xform(tr, uvertices)
normals = vecops.xform(tr, normals)
logging.info("making model-view matrix")
- minx, maxx, miny, maxy, _, maxz = bbox.makebb(vertices)
+ minx, maxx, miny, maxy, _, maxz = bbox.makebb(uvertices)
width = maxx - minx
height = maxy - miny
dx = -(minx + maxx) / 2
@@ -143,23 +146,22 @@ def main(argv):
v[0][3], v[1][3] = args.canvas_size / 2, args.canvas_size / 2
mv = matrix.concat(m, v)
logging.info("transforming to view space")
- vertices = vecops.xform(mv, vertices)
- facets = list(utils.chunked(vertices, 3))
+ uvertices = vecops.xform(mv, uvertices)
# In the ortho projection on the z=0 plane, z+ is _towards_ the viewer
logging.info("determine visible facets")
- vf = [(f, n, 0.4 * n[2] + 0.5) for f, n in zip(facets, normals) if n[2] > 0]
+ vf = [(f, n, 0.4 * n[2] + 0.5) for f, n in zip(ifacets, normals) if n[2] > 0]
fvs = "{:.2f}% of facets is visible"
- logging.info(fvs.format(100 * len(vf) / len(facets)))
+ logging.info(fvs.format(100 * len(vf) / len(ifacets)))
# Next, depth-sort the facets using the largest z-value of the
# three vertices.
logging.info("depth-sorting visible facets")
def fkey(t):
(a, b, c), _, _ = t
- return max(a[2], b[2], c[2])
+ return max(uvertices[a][2], uvertices[b][2], uvertices[c][2])
vf.sort(key=fkey)
- minx, maxx, miny, maxy, _, maxz = bbox.makebb(vertices)
+ minx, maxx, miny, maxy, _, maxz = bbox.makebb(uvertices)
logging.info("creating PostScript header")
s1 = "% The scale factor used is: {:.2f} PostScript points/STL-unit"
s2 = (
@@ -179,7 +181,7 @@ def main(argv):
cs.format(origbb[4], "z", origbb[5]),
s1.format(sf),
s2.format(maxx, maxy, maxx / 72 * 25.4, maxy / 72 * 25.4),
- "% {} of {} facets are visible.".format(len(vf), len(facets)),
+ "% {} of {} facets are visible.".format(len(vf), len(ifacets)),
]
# PostScript settings and macros.
lines += [
@@ -209,7 +211,13 @@ def main(argv):
logging.info("rendering triangles")
lines += [
s3.format(
- f_red * i, f_green * i, f_blue * i, a[0], a[1], b[0], b[1], c[0], c[1]
+ f_red * i, f_green * i, f_blue * i,
+ uvertices[a][0],
+ uvertices[a][1],
+ uvertices[b][0],
+ uvertices[b][1],
+ uvertices[c][0],
+ uvertices[c][1]
)
for (a, b, c), z, i in vf
]
The list of vertices and variables derived from it were renamed to ensure that all places where they were used have been updated.
By indexing, the 832944 vertices in headrest_mesh.stl
are reduced to
139526 unique vertices, a reduction of more than 80%! This is bound to have
a significant effect.
After these changes the new profile is generated:
> python -m cProfile -s tottime stl2ps.py -x 200 ../testdata-stltools/headrest_mesh.stl | less 26916624 function calls (26916202 primitive calls) in 10.171 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 2504448 1.811 0.000 3.252 0.000 {built-in method builtins.sum} 3 1.577 0.526 5.089 1.696 vecops.py:152(xform) 11134000 1.296 0.000 1.296 0.000 vecops.py:171(<genexpr>) 277648 0.719 0.000 2.070 0.000 vecops.py:49(normal) 253797 0.636 0.000 0.636 0.000 {method 'format' of 'str' objects} 277648 0.292 0.000 0.880 0.000 vecops.py:24(normalize) 832945 0.264 0.000 0.270 0.000 stl.py:82(_getbp) 832945 0.264 0.000 0.516 0.000 vecops.py:120(<genexpr>) 1 0.206 0.206 0.206 0.206 stl2ps.py:152(<listcomp>) 556703 0.191 0.000 0.191 0.000 vecops.py:134(<genexpr>) 2784517 0.189 0.000 0.189 0.000 {method 'append' of 'list' objects} 277648 0.187 0.000 0.187 0.000 vecops.py:38(cross) 1 0.186 0.186 10.046 10.046 stl2ps.py:27(main) 832988 0.185 0.000 0.185 0.000 {method 'setdefault' of 'dict' objects} 1 0.179 0.179 0.816 0.816 stl2ps.py:212(<listcomp>) 277649 0.175 0.000 0.175 0.000 utils.py:85(take) 3 0.161 0.054 0.161 0.054 vecops.py:149(<listcomp>) 1110592 0.147 0.000 0.147 0.000 vecops.py:59(<genexpr>) 1110592 0.146 0.000 0.146 0.000 vecops.py:21(<genexpr>) 1110592 0.137 0.000 0.137 0.000 vecops.py:60(<genexpr>) 1 0.135 0.135 2.205 2.205 stl2ps.py:130(<listcomp>) 1110592 0.132 0.000 0.132 0.000 vecops.py:35(<genexpr>) 277648 0.126 0.000 0.456 0.000 vecops.py:11(length) 1 0.116 0.116 10.171 10.171 stl2ps.py:9(<module>)
This change has reduced the required time by 56%!
Porting the changes to stl2pdf
The stl2pdf
script is very similar to stl2ps
.
In essence, it replaces generating PostScript drawing commands by using
a cairo.Context
to generate PDF drawing commands on
a cairo.PDFSurface
.
The changes from stl2ps
could therefore be easily ported to stl2pdf
.
Using the same input file the run time of stl2pdf
was reduced from
26.2 s to 13.3 s, so almost by 50%.
After the improvement, the largest identified chunk of total time is spent in
the Cairo library.
Lessons learned
- If you cannot improve the code, see if you can reduce the amount of data.
- Make sure to port improvments to similar programs.
For comments, please send me an e-mail.
Related articles
- Profiling Python scripts (4): vecops.indexate
- Profiling Python scripts (2): stlinfo
- Profiling Python scripts (1): stl2pov
- Profiling Python scripts(6): auto-orient
- Profiling with pyinstrument