Profiling Python scripts (3): stl2ps

This is the third in a series of articles that covers analyzing and improving performance bottlenecks in Python scripts. The performance of stl2ps is the topic of this article.

Note

The other articles in this series will be listed under “Related Articles” at the bottom of the page.

Profiling stl2ps

Establishing the baseline. As usual, anything with a total time of <0.1 s is not shown:

> python -m cProfile -s tottime stl2ps.py -x 200 ../testdata-stltools/headrest_mesh.stl | less
        66300521 function calls (66300099 primitive calls) in 23.493 seconds

Ordered by: internal time

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 8051792    5.931    0.000   10.664    0.000 {built-in method builtins.sum}
       3    5.566    1.855   18.026    6.009 vecops.py:152(xform)
38870720    4.590    0.000    4.590    0.000 vecops.py:171(<genexpr>)
  277648    0.721    0.000    2.073    0.000 vecops.py:49(normal)
 9718697    0.702    0.000    0.702    0.000 {method 'append' of 'list' objects}
 1943539    0.683    0.000    0.683    0.000 vecops.py:134(<genexpr>)
  253797    0.651    0.000    0.651    0.000 {method 'format' of 'str' objects}
       3    0.569    0.190    0.569    0.190 vecops.py:149(<listcomp>)
       1    0.452    0.452   23.299   23.299 stl2ps.py:27(main)
  555298    0.420    0.000    0.420    0.000 utils.py:85(take)
  277648    0.294    0.000    0.880    0.000 vecops.py:24(normalize)
  832945    0.257    0.000    0.263    0.000 stl.py:82(_getbp)
       1    0.239    0.239    0.239    0.239 stl2ps.py:150(<listcomp>)
  277648    0.196    0.000    0.196    0.000 vecops.py:38(cross)
  253859    0.192    0.000    0.192    0.000 {built-in method builtins.max}
       1    0.185    0.185   23.493   23.493 stl2ps.py:9(<module>)
       3    0.166    0.055    0.850    0.283 vecops.py:124(to4)
       1    0.160    0.160    0.811    0.811 stl2ps.py:210(<listcomp>)
 1110592    0.143    0.000    0.143    0.000 vecops.py:21(<genexpr>)
     221    0.142    0.001    0.142    0.001 {built-in method builtins.min}
 1110592    0.142    0.000    0.142    0.000 vecops.py:59(<genexpr>)
 1110592    0.137    0.000    0.137    0.000 vecops.py:35(<genexpr>)
 1110592    0.133    0.000    0.133    0.000 vecops.py:60(<genexpr>)
  277648    0.125    0.000    0.450    0.000 vecops.py:11(length)
       1    0.104    0.104    2.176    2.176 stl2ps.py:129(<listcomp>)

The sum method is only used in two places, both in vecops.py. One use is in the length function. But since this isn’t used much, that is not the main issue. It is also used in xform, which is called a lot.

So the three top total times basically all belong to the xform function. Unfortunately, there is nothing that we can do to make the built-in method sum faster. The xform function is simply using a 4×4 matrix to transform vectors. At the moment I don’t see an easy way to make this intrinsically faster.

Note

Using multiprocessing.Pool to spread the work over multiple cores would be a possibility. But that will not be investigated at this moment.

But that doesn’t mean nothing can be done.

In stl2ps.py, the vertices are transformed twice, and the facet normals once. In the vertices of an STL file, every unique point usually appears at least three times for a closed object. So determining the unique points and only transforming those could be a significant speedup. Luckily, the basis for that (vecops.indexate) already exists. The following code changes are needed to implement this.

diff --git a/stl2ps.py b/stl2ps.py
index 669e8d1..860ff80 100644
--- a/stl2ps.py
+++ b/stl2ps.py
@@ -5,7 +5,7 @@
# Copyright © 2011-2020 R.F. Smith <rsmith@xs4all.nl>.
# SPDX-License-Identifier: MIT
# Created: 2011-04-11T01:41:59+02:00
-# Last modified: 2020-10-04T17:42:36+0200
+# Last modified: 2022-01-19T22:05:40+0100
"""
Program for converting a view of an STL file into a PostScript file.

@@ -122,16 +122,19 @@ def main(argv):
    except ValueError as e:
        logging.error("{}: {}".format(args.file, e))
        sys.exit(1)
-    origbb = bbox.makebb(vertices)
+    indices, uvertices = vecops.indexate(vertices)
+    del vertices
+    origbb = bbox.makebb(uvertices)
    logging.info("calculating normal vectors")
-    facets = list(utils.chunked(vertices, 3))
-    # facets = list(utils.chunked(utils.chunked(vertices, 3), 3))
-    normals = [vecops.normal(a, b, c) for a, b, c in facets]
+    ifacets = list(utils.chunked(indices, 3))
+    normals = [
+        vecops.normal(uvertices[a], uvertices[b], uvertices[c]) for a, b, c in ifacets
+    ]
    logging.info("applying transformations to world coordinates")
-    vertices = vecops.xform(tr, vertices)
+    uvertices = vecops.xform(tr, uvertices)
    normals = vecops.xform(tr, normals)
    logging.info("making model-view matrix")
-    minx, maxx, miny, maxy, _, maxz = bbox.makebb(vertices)
+    minx, maxx, miny, maxy, _, maxz = bbox.makebb(uvertices)
    width = maxx - minx
    height = maxy - miny
    dx = -(minx + maxx) / 2
@@ -143,23 +146,22 @@ def main(argv):
    v[0][3], v[1][3] = args.canvas_size / 2, args.canvas_size / 2
    mv = matrix.concat(m, v)
    logging.info("transforming to view space")
-    vertices = vecops.xform(mv, vertices)
-    facets = list(utils.chunked(vertices, 3))
+    uvertices = vecops.xform(mv, uvertices)
    # In the ortho projection on the z=0 plane, z+ is _towards_ the viewer
    logging.info("determine visible facets")
-    vf = [(f, n, 0.4 * n[2] + 0.5) for f, n in zip(facets, normals) if n[2] > 0]
+    vf = [(f, n, 0.4 * n[2] + 0.5) for f, n in zip(ifacets, normals) if n[2] > 0]
    fvs = "{:.2f}% of facets is visible"
-    logging.info(fvs.format(100 * len(vf) / len(facets)))
+    logging.info(fvs.format(100 * len(vf) / len(ifacets)))
    # Next, depth-sort the facets using the largest z-value of the
    # three vertices.
    logging.info("depth-sorting visible facets")

    def fkey(t):
        (a, b, c), _, _ = t
-        return max(a[2], b[2], c[2])
+        return max(uvertices[a][2], uvertices[b][2], uvertices[c][2])

    vf.sort(key=fkey)
-    minx, maxx, miny, maxy, _, maxz = bbox.makebb(vertices)
+    minx, maxx, miny, maxy, _, maxz = bbox.makebb(uvertices)
    logging.info("creating PostScript header")
    s1 = "% The scale factor used is: {:.2f} PostScript points/STL-unit"
    s2 = (
@@ -179,7 +181,7 @@ def main(argv):
        cs.format(origbb[4], "z", origbb[5]),
        s1.format(sf),
        s2.format(maxx, maxy, maxx / 72 * 25.4, maxy / 72 * 25.4),
-        "% {} of {} facets are visible.".format(len(vf), len(facets)),
+        "% {} of {} facets are visible.".format(len(vf), len(ifacets)),
    ]
    # PostScript settings and macros.
    lines += [
@@ -209,7 +211,13 @@ def main(argv):
    logging.info("rendering triangles")
    lines += [
        s3.format(
-            f_red * i, f_green * i, f_blue * i, a[0], a[1], b[0], b[1], c[0], c[1]
+            f_red * i, f_green * i, f_blue * i,
+            uvertices[a][0],
+            uvertices[a][1],
+            uvertices[b][0],
+            uvertices[b][1],
+            uvertices[c][0],
+            uvertices[c][1]
        )
        for (a, b, c), z, i in vf
    ]

The list of vertices and variables derived from it were renamed to ensure that all places where they were used have been updated.

By indexing, the 832944 vertices in headrest_mesh.stl are reduced to 139526 unique vertices, a reduction of more than 80%! This is bound to have a significant effect.

After these changes the new profile is generated:

> python -m cProfile -s tottime stl2ps.py -x 200 ../testdata-stltools/headrest_mesh.stl | less
         26916624 function calls (26916202 primitive calls) in 10.171 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  2504448    1.811    0.000    3.252    0.000 {built-in method builtins.sum}
        3    1.577    0.526    5.089    1.696 vecops.py:152(xform)
 11134000    1.296    0.000    1.296    0.000 vecops.py:171(<genexpr>)
   277648    0.719    0.000    2.070    0.000 vecops.py:49(normal)
   253797    0.636    0.000    0.636    0.000 {method 'format' of 'str' objects}
   277648    0.292    0.000    0.880    0.000 vecops.py:24(normalize)
   832945    0.264    0.000    0.270    0.000 stl.py:82(_getbp)
   832945    0.264    0.000    0.516    0.000 vecops.py:120(<genexpr>)
        1    0.206    0.206    0.206    0.206 stl2ps.py:152(<listcomp>)
   556703    0.191    0.000    0.191    0.000 vecops.py:134(<genexpr>)
  2784517    0.189    0.000    0.189    0.000 {method 'append' of 'list' objects}
   277648    0.187    0.000    0.187    0.000 vecops.py:38(cross)
        1    0.186    0.186   10.046   10.046 stl2ps.py:27(main)
   832988    0.185    0.000    0.185    0.000 {method 'setdefault' of 'dict' objects}
        1    0.179    0.179    0.816    0.816 stl2ps.py:212(<listcomp>)
   277649    0.175    0.000    0.175    0.000 utils.py:85(take)
        3    0.161    0.054    0.161    0.054 vecops.py:149(<listcomp>)
  1110592    0.147    0.000    0.147    0.000 vecops.py:59(<genexpr>)
  1110592    0.146    0.000    0.146    0.000 vecops.py:21(<genexpr>)
  1110592    0.137    0.000    0.137    0.000 vecops.py:60(<genexpr>)
        1    0.135    0.135    2.205    2.205 stl2ps.py:130(<listcomp>)
  1110592    0.132    0.000    0.132    0.000 vecops.py:35(<genexpr>)
   277648    0.126    0.000    0.456    0.000 vecops.py:11(length)
        1    0.116    0.116   10.171   10.171 stl2ps.py:9(<module>)

This change has reduced the required time by 56%!

Porting the changes to `stl2pdf`

The stl2pdf script is very similar to stl2ps. In essence, it replaces generating PostScript drawing commands by using a cairo.Context to generate PDF drawing commands on a cairo.PDFSurface.

The changes from stl2ps could therefore be easily ported to stl2pdf. Using the same input file the run time of stl2pdf was reduced from 26.2 s to 13.3 s, so almost by 50%. After the improvement, the largest identified chunk of total time is spent in the Cairo library.

Lessons learned

If you cannot improve the code, see if you can reduce the amount of data.
Make sure to port improvments to similar programs.

For comments, please send me an e-mail.

← Profiling Python scripts (2): stlinfo Profiling Python scripts (4): vecops.indexate →

Roland's homepage

Profiling Python scripts (3): stl2ps

Profiling stl2ps

Porting the changes to `stl2pdf`

Lessons learned

Related articles

Profiling stl2ps

Porting the changes to stl2pdf

Lessons learned

Related articles

Porting the changes to `stl2pdf`