Profiling Python scripts(6): auto-orient
This is the fifth in a series of articles that covers analyzing and improving performance bottlenecks in Python scripts. The program calculix-orient is a helper program that automates the asignment of local coordinate systems for anisotropic finite elements. It is designed to work together with CalculiX.
A first analysis using pyinstrument was done to investigate which part of the program takes the most time.
All testing was done using the default Python version in use on FreeBSD, which was 3.11 at time of writing:
> python -m pyinstrument auto-orient.py -t top bot _ ._ __/__ _ _ _ _ _/_ Recorded: 09:05:37 Samples: 1048 /_//_/// /_\ / //_// / //_'/ // Duration: 1.107 CPU time: 1.107 / _/ v4.6.2 Program: pyinstrument/__main__.py auto-orient.py -t top bot 1.104 <module> auto-orient.py:1 └─ 1.091 main auto-orient.py:24 ├─ 0.888 read_allmsh auto-orient.py:115 │ ├─ 0.404 <listcomp> auto-orient.py:154 │ │ ├─ 0.357 to_element auto-orient.py:148 │ │ │ ├─ 0.158 [self] auto-orient.py │ │ │ ├─ 0.113 <genexpr> auto-orient.py:149 │ │ │ ├─ 0.071 <genexpr> auto-orient.py:152 │ │ │ └─ 0.015 str.split <built-in> │ │ ├─ 0.017 <genexpr> auto-orient.py:145 │ │ ├─ 0.015 [self] auto-orient.py │ │ └─ 0.015 <genexpr> auto-orient.py:146 │ ├─ 0.345 <listcomp> auto-orient.py:143 │ │ ├─ 0.262 to_node auto-orient.py:140 │ │ │ ├─ 0.160 [self] auto-orient.py │ │ │ └─ 0.102 <genexpr> auto-orient.py:141 │ │ ├─ 0.056 [self] auto-orient.py │ │ └─ 0.019 str.split <built-in> │ ├─ 0.058 <listcomp> auto-orient.py:126 │ │ ├─ 0.032 [self] auto-orient.py │ │ └─ 0.026 str.startswith <built-in> │ ├─ 0.046 <listcomp> auto-orient.py:125 │ │ ├─ 0.029 [self] auto-orient.py │ │ └─ 0.017 str.strip <built-in> │ └─ 0.032 [self] auto-orient.py ├─ 0.157 set_normals auto-orient.py:185 │ ├─ 0.053 isclose auto-orient.py:222 │ │ ├─ 0.023 [self] auto-orient.py │ │ └─ 0.018 dot auto-orient.py:260 │ ├─ 0.037 normalize auto-orient.py:297 │ │ ├─ 0.018 [self] auto-orient.py │ │ └─ 0.012 <genexpr> auto-orient.py:308 │ ├─ 0.028 [self] auto-orient.py │ └─ 0.025 <lambda> auto-orient.py:205 │ └─ 0.014 [self] auto-orient.py └─ 0.034 read_named_set auto-orient.py:158 ├─ 0.017 <listcomp> auto-orient.py:173 └─ 0.012 <listcomp> auto-orient.py:172 To view this report with different options, run: pyinstrument --load-prev 2024-07-14T09-05-37 [options]
It is apparent that reading the mesh data by the function read_allmsh
takes most of the time.
That function contains two inner functions, to_node
and to_element
that are responsible for most of the time used.
This is because they are called many times inside list comprehensions.
def to_node(items):
return (int(items[0]), tuple(float(j) for j in items[1:]))
nodes = dict([to_node(ln.strip().split(",")) for ln in lines[ns:ne]])
starts = (ln for ln in lines[es:] if ln.endswith(","))
ends = (ln for ln in lines[es:] if not ln.endswith(","))
def to_element(s, e):
items = tuple(int(j) for j in s.split(",")[:-1]) + tuple(
int(j) for j in e.split(",")
)
return (items[0], tuple(nodes[k] for k in items[1:]))
elements = dict([to_element(s, e) for s, e in zip(starts, ends)])
This obviously leeds to a lot of function calls. So I rewrote these by using generator expressions
# Read nodes
nstep1 = (ln.split(",") for ln in lines[ns:ne])
nodes = dict(
(int(items[0]), tuple(float(j) for j in items[1:])) for items in nstep1
)
# Read C3D20(R) elements
starts = (ln for ln in lines[es:] if ln.endswith(","))
ends = (ln for ln in lines[es:] if not ln.endswith(","))
estep1 = ((s + e).split(",") for s, e in zip(starts, ends))
elements = dict(
(int(items[0]), tuple(nodes[int(k)] for k in items[1:])) for items in estep1
)
After that change the profiling was repeated:
> python -m pyinstrument auto-orient.py -t top bot _ ._ __/__ _ _ _ _ _/_ Recorded: 09:06:22 Samples: 903 /_//_/// /_\ / //_// / //_'/ // Duration: 0.933 CPU time: 0.933 / _/ v4.6.2 Program: pyinstrument/__main__.py auto-orient.py -t top bot 0.930 <module> auto-orient.py:1 └─ 0.918 main auto-orient.py:24 ├─ 0.718 read_allmsh auto-orient.py:115 │ ├─ 0.314 <genexpr> auto-orient.py:141 │ │ ├─ 0.156 [self] auto-orient.py │ │ ├─ 0.104 <genexpr> auto-orient.py:142 │ │ └─ 0.054 <genexpr> auto-orient.py:140 │ │ ├─ 0.031 str.split <built-in> │ │ └─ 0.023 [self] auto-orient.py │ ├─ 0.263 <genexpr> auto-orient.py:148 │ │ ├─ 0.142 <genexpr> auto-orient.py:149 │ │ ├─ 0.061 [self] auto-orient.py │ │ └─ 0.060 <genexpr> auto-orient.py:147 │ │ ├─ 0.017 <genexpr> auto-orient.py:145 │ │ │ └─ 0.011 [self] auto-orient.py │ │ ├─ 0.016 str.split <built-in> │ │ ├─ 0.015 [self] auto-orient.py │ │ └─ 0.012 <genexpr> auto-orient.py:146 │ ├─ 0.058 <listcomp> auto-orient.py:126 │ │ ├─ 0.035 [self] auto-orient.py │ │ └─ 0.023 str.startswith <built-in> │ ├─ 0.047 <listcomp> auto-orient.py:125 │ │ ├─ 0.028 [self] auto-orient.py │ │ └─ 0.019 str.strip <built-in> │ └─ 0.034 [self] auto-orient.py ├─ 0.155 set_normals auto-orient.py:181 │ ├─ 0.053 isclose auto-orient.py:218 │ │ ├─ 0.024 [self] auto-orient.py │ │ └─ 0.023 dot auto-orient.py:256 │ ├─ 0.031 <lambda> auto-orient.py:201 │ │ ├─ 0.015 <genexpr> auto-orient.py:201 │ │ └─ 0.014 [self] auto-orient.py │ ├─ 0.029 normalize auto-orient.py:293 │ │ ├─ 0.013 <genexpr> auto-orient.py:304 │ │ └─ 0.012 [self] auto-orient.py │ └─ 0.023 [self] auto-orient.py └─ 0.033 read_named_set auto-orient.py:154 ├─ 0.016 <listcomp> auto-orient.py:169 └─ 0.012 <listcomp> auto-orient.py:168
This change improved the performance by 15%.
In an attempt to further improve, I ported the read_mesh routine from
cgx.py
(which basically does the same) to only return the elements, but
that was slightly worse.
In [1]: cd /home/rsmith/src/progs/calculix-orient
In [2]: ao = __import__("auto-orient");
In [3]: elements = ao.read_allmsh("test/all.msh");
In [4]: %timeit elements = ao.read_allmsh("test/all.msh");
63.5 ms ± 55.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [5]: %timeit elements = ao.read_mesh("test/all.msh");
66 ms ± 307 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
The read_mesh
function iterates over the lines of the mesh file and
functions as a state machine.
It is interesting to see that two completely different approaches yield almost the same time. Reading the data was only a small faction of the total time, so it’s not that this task is I/O bound.
For comments, please send me an e-mail.
Related articles
- Profiling with pyinstrument
- Profiling Python scripts (5): ent_without_numpy
- Profiling Python scripts (4): vecops.indexate
- Profiling Python scripts (3): stl2ps
- Profiling Python scripts (2): stlinfo