Profiling Python scripts (4): vecops.indexate
This is the fourth in a series of articles that covers analyzing and improving
performance bottlenecks in Python scripts.
The performance of stl2ps
is the topic of this article.
Note
The other articles in this series will be listed under “Related Articles” at the bottom of the page.
Why vecops.indexate
In the course of writing the third article, I looked into the
implementation of vecops.indexate
.
This function takes a list of vertices (3-tuples) and returns a tuple of
indices and a tuple of unique vertices.
In STL files the same vertex is often found in three or even more facets.
For generating a POV-ray “mesh2”, we require a list of unique vertices.
The facets are then built out of indices into that list.
That is what vecops.indexate
was designed for.
How it works
To show how vecops.indexate
works, it will be executed line by line in IPython.
First we create a list of points (from an existing STL file) and an empty dictionary.
In [1]: points = [(139.3592529296875, 97.01824951171875, 69.91365814208984),
...: (138.14346313476562, 97.50768280029297, 70.0313949584961),
...: (138.79571533203125, 97.875244140625, 69.03594970703125),
...: (138.14346313476562, 97.50768280029297, 70.0313949584961),
...: (139.3592529296875, 97.01824951171875, 69.91365814208984),
...: (138.76876831054688, 96.6264419555664, 70.94448852539062),
...: (138.79571533203125, 97.875244140625, 69.03594970703125),
...: (138.14346313476562, 97.50768280029297, 70.0313949584961),
...: (137.59768676757812, 98.35258483886719, 69.14176177978516),
...: (139.3592529296875, 97.01824951171875, 69.91365814208984)];
In [2]: pd = {}
Out[2]: {}
Next comes the clever bit.
This generator expression employs the setdefault method of dictionaries to
both generate the indices and fill the pd
dictionary.
In this case, the vertex (a 3-tuple of float
) is used as the key and the
current size of the dictionary is the value.
Since dictionary keys are unique this removes duplicate points.
In [3]: indices = tuple(pd.setdefault(tuple(p), len(pd)) for p in points)
Out[3]: (0, 1, 2, 1, 0, 3, 2, 1, 4, 0)
In [4]: pd
Out[4]:
{(139.3592529296875, 97.01824951171875, 69.91365814208984): 0,
(138.14346313476562, 97.50768280029297, 70.0313949584961): 1,
(138.79571533203125, 97.875244140625, 69.03594970703125): 2,
(138.76876831054688, 96.6264419555664, 70.94448852539062): 3,
(137.59768676757812, 98.35258483886719, 69.14176177978516): 4}
Note that this output was generated using Python 3.9. From Python 3.7 onwards, dictionaries are guaranteed to maintain insertion order, according to the documentation:
- > Changed in version 3.7: Dictionary order is guaranteed to be insertion
- order. This behavior was implementation detail of CPython from 3.6.
That is what made the following lines necessary in the past.
In [5]: pt = sorted([(v, k) for k, v in pd.items()], key=lambda x: x[0])
Out[5]:
[(0, (139.3592529296875, 97.01824951171875, 69.91365814208984)),
(1, (138.14346313476562, 97.50768280029297, 70.0313949584961)),
(2, (138.79571533203125, 97.875244140625, 69.03594970703125)),
(3, (138.76876831054688, 96.6264419555664, 70.94448852539062)),
(4, (137.59768676757812, 98.35258483886719, 69.14176177978516))]
In [6]: unique = tuple(i[1] for i in pt)
Out[6]:
((139.3592529296875, 97.01824951171875, 69.91365814208984),
(138.14346313476562, 97.50768280029297, 70.0313949584961),
(138.79571533203125, 97.875244140625, 69.03594970703125),
(138.76876831054688, 96.6264419555664, 70.94448852539062),
(137.59768676757812, 98.35258483886719, 69.14176177978516))
The function returns a 2-tuple of indices
and unique
.
Improvements
Since stltools
requires Python 3.6 or later anyway, we can make use of
the fact that insertion order is maintained in dictionaries.
This means that unique
can be trivially generated from pd.keys()
, and
building pt
is no longer needed.
Let’s see what that brings.
In [1]: cd src/progs/stltools/
/home/rsmith/src/progs/stltools
In [2]: from stltools import stl
In [3]: vertices, _ = stl.readstl("../testdata-stltools/headrest_mesh.stl", "ascii")
In [4]: len(vertices)
Out[4]: 832944
In [5]: def old_indexate(points):
...: pd = {}
...: indices = tuple(pd.setdefault(tuple(p), len(pd)) for p in points)
...: pt = sorted([(v, k) for k, v in pd.items()], key=lambda x: x[0])
...: unique = tuple(i[1] for i in pt)
...: return indices, unique
...:
In [6]: %timeit old_indexate(vertices)
301 ms ± 729 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: def new_indexate(points):
...: pd = {}
...: indices = tuple(pd.setdefault(p, len(pd)) for p in points)
...: return indices, tuple(pd.keys())
...:
In [8]: %timeit new_indexate(vertices)
227 ms ± 307 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [9]: indices, unique = old_indexate(vertices)
In [10]: len(indices), len(unique)
Out[10]: (832944, 139526)
In [11]: 1 - len(unique)/len(indices)
Out[11]: 0.8324905395800918
So for in the order of 800000 vertices, the required time is reduced by around 25%! This in itself is worthwhile.
But more important is that 832944 vertices are reduced to 139526 unique vertices, a reduction of more than 80%, in a fraction of a second. This explains the significant difference in the third article.
Lessons learned
- A REPL like IPython can be invaluable in understanding how code works.
- It is important to know and take advantage of the features of the data structures your language offers you.
- The greatest speed-ups can often be achieved by changes to the algorithm.
For comments, please send me an e-mail.
Related articles
- Profiling Python scripts (3): stl2ps
- Profiling Python scripts (2): stlinfo
- Profiling Python scripts (1): stl2pov
- Profiling Python scripts(6): auto-orient
- Profiling with pyinstrument