Profiling Python scripts (4): vecops.indexate

This is the fourth in a series of articles that covers analyzing and improving performance bottlenecks in Python scripts. The performance of stl2ps is the topic of this article.

Note

The other articles in this series will be listed under “Related Articles” at the bottom of the page.

Why `vecops.indexate`

In the course of writing the third article, I looked into the implementation of vecops.indexate. This function takes a list of vertices (3-tuples) and returns a tuple of indices and a tuple of unique vertices.

In STL files the same vertex is often found in three or even more facets. For generating a POV-ray “mesh2”, we require a list of unique vertices. The facets are then built out of indices into that list. That is what vecops.indexate was designed for.

How it works

To show how vecops.indexate works, it will be executed line by line in IPython.

First we create a list of points (from an existing STL file) and an empty dictionary.

In [1]: points = [(139.3592529296875, 97.01824951171875, 69.91365814208984),
...:  (138.14346313476562, 97.50768280029297, 70.0313949584961),
...:  (138.79571533203125, 97.875244140625, 69.03594970703125),
...:  (138.14346313476562, 97.50768280029297, 70.0313949584961),
...:  (139.3592529296875, 97.01824951171875, 69.91365814208984),
...:  (138.76876831054688, 96.6264419555664, 70.94448852539062),
...:  (138.79571533203125, 97.875244140625, 69.03594970703125),
...:  (138.14346313476562, 97.50768280029297, 70.0313949584961),
...:  (137.59768676757812, 98.35258483886719, 69.14176177978516),
...:  (139.3592529296875, 97.01824951171875, 69.91365814208984)];

In [2]: pd = {}
Out[2]: {}

Next comes the clever bit. This generator expression employs the setdefault method of dictionaries to both generate the indices and fill the pd dictionary. In this case, the vertex (a 3-tuple of float) is used as the key and the current size of the dictionary is the value. Since dictionary keys are unique this removes duplicate points.

In [3]: indices = tuple(pd.setdefault(tuple(p), len(pd)) for p in points)
Out[3]: (0, 1, 2, 1, 0, 3, 2, 1, 4, 0)

In [4]: pd
Out[4]:
{(139.3592529296875, 97.01824951171875, 69.91365814208984): 0,
 (138.14346313476562, 97.50768280029297, 70.0313949584961): 1,
 (138.79571533203125, 97.875244140625, 69.03594970703125): 2,
 (138.76876831054688, 96.6264419555664, 70.94448852539062): 3,
 (137.59768676757812, 98.35258483886719, 69.14176177978516): 4}

Note that this output was generated using Python 3.9. From Python 3.7 onwards, dictionaries are guaranteed to maintain insertion order, according to the documentation:

> Changed in version 3.7: Dictionary order is guaranteed to be insertion: order. This behavior was implementation detail of CPython from 3.6.

That is what made the following lines necessary in the past.

In [5]: pt = sorted([(v, k) for k, v in pd.items()], key=lambda x: x[0])
Out[5]:
[(0, (139.3592529296875, 97.01824951171875, 69.91365814208984)),
(1, (138.14346313476562, 97.50768280029297, 70.0313949584961)),
(2, (138.79571533203125, 97.875244140625, 69.03594970703125)),
(3, (138.76876831054688, 96.6264419555664, 70.94448852539062)),
(4, (137.59768676757812, 98.35258483886719, 69.14176177978516))]

In [6]: unique = tuple(i[1] for i in pt)
Out[6]:
((139.3592529296875, 97.01824951171875, 69.91365814208984),
(138.14346313476562, 97.50768280029297, 70.0313949584961),
(138.79571533203125, 97.875244140625, 69.03594970703125),
(138.76876831054688, 96.6264419555664, 70.94448852539062),
(137.59768676757812, 98.35258483886719, 69.14176177978516))

The function returns a 2-tuple of indices and unique.

Improvements

Since stltools requires Python 3.6 or later anyway, we can make use of the fact that insertion order is maintained in dictionaries. This means that unique can be trivially generated from pd.keys(), and building pt is no longer needed.

Let’s see what that brings.

In [1]: cd src/progs/stltools/
/home/rsmith/src/progs/stltools

In [2]: from stltools import stl

In [3]: vertices, _ = stl.readstl("../testdata-stltools/headrest_mesh.stl", "ascii")

In [4]: len(vertices)
Out[4]: 832944

In [5]: def old_indexate(points):
...:     pd = {}
...:     indices = tuple(pd.setdefault(tuple(p), len(pd)) for p in points)
...:     pt = sorted([(v, k) for k, v in pd.items()], key=lambda x: x[0])
...:     unique = tuple(i[1] for i in pt)
...:     return indices, unique
...:

In [6]: %timeit old_indexate(vertices)
301 ms ± 729 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]: def new_indexate(points):
...:     pd = {}
...:     indices = tuple(pd.setdefault(p, len(pd)) for p in points)
...:     return indices, tuple(pd.keys())
...:

In [8]: %timeit new_indexate(vertices)
227 ms ± 307 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [9]: indices, unique = old_indexate(vertices)

In [10]: len(indices), len(unique)
Out[10]: (832944, 139526)

In [11]: 1 - len(unique)/len(indices)
Out[11]: 0.8324905395800918

So for in the order of 800000 vertices, the required time is reduced by around 25%! This in itself is worthwhile.

But more important is that 832944 vertices are reduced to 139526 unique vertices, a reduction of more than 80%, in a fraction of a second. This explains the significant difference in the third article.

Lessons learned

A REPL like IPython can be invaluable in understanding how code works.
It is important to know and take advantage of the features of the data structures your language offers you.
The greatest speed-ups can often be achieved by changes to the algorithm.

For comments, please send me an e-mail.

← Profiling Python scripts (3): stl2ps Profiling Python scripts (5): ent_without_numpy →

Roland's homepage

Profiling Python scripts (4): vecops.indexate

Why `vecops.indexate`

How it works

Improvements

Lessons learned

Related articles

Why vecops.indexate

How it works

Improvements

Lessons learned

Related articles

Why `vecops.indexate`