Profiling Python scripts(6): auto-orient

This is the fifth in a series of articles that covers analyzing and improving performance bottlenecks in Python scripts. The program calculix-orient is a helper program that automates the asignment of local coordinate systems for anisotropic finite elements. It is designed to work together with CalculiX.

A first analysis using pyinstrument was done to investigate which part of the program takes the most time.

All testing was done using the default Python version in use on FreeBSD, which was 3.11 at time of writing:

> python -m pyinstrument auto-orient.py -t top bot

_     ._   __/__   _ _  _  _ _/_   Recorded: 09:05:37  Samples:  1048
/_//_/// /_\ / //_// / //_'/ //     Duration: 1.107     CPU time: 1.107
/   _/                      v4.6.2

Program: pyinstrument/__main__.py auto-orient.py -t top bot

1.104 <module>  auto-orient.py:1
└─ 1.091 main  auto-orient.py:24
   ├─ 0.888 read_allmsh  auto-orient.py:115
   │  ├─ 0.404 <listcomp>  auto-orient.py:154
   │  │  ├─ 0.357 to_element  auto-orient.py:148
   │  │  │  ├─ 0.158 [self]  auto-orient.py
   │  │  │  ├─ 0.113 <genexpr>  auto-orient.py:149
   │  │  │  ├─ 0.071 <genexpr>  auto-orient.py:152
   │  │  │  └─ 0.015 str.split  <built-in>
   │  │  ├─ 0.017 <genexpr>  auto-orient.py:145
   │  │  ├─ 0.015 [self]  auto-orient.py
   │  │  └─ 0.015 <genexpr>  auto-orient.py:146
   │  ├─ 0.345 <listcomp>  auto-orient.py:143
   │  │  ├─ 0.262 to_node  auto-orient.py:140
   │  │  │  ├─ 0.160 [self]  auto-orient.py
   │  │  │  └─ 0.102 <genexpr>  auto-orient.py:141
   │  │  ├─ 0.056 [self]  auto-orient.py
   │  │  └─ 0.019 str.split  <built-in>
   │  ├─ 0.058 <listcomp>  auto-orient.py:126
   │  │  ├─ 0.032 [self]  auto-orient.py
   │  │  └─ 0.026 str.startswith  <built-in>
   │  ├─ 0.046 <listcomp>  auto-orient.py:125
   │  │  ├─ 0.029 [self]  auto-orient.py
   │  │  └─ 0.017 str.strip  <built-in>
   │  └─ 0.032 [self]  auto-orient.py
   ├─ 0.157 set_normals  auto-orient.py:185
   │  ├─ 0.053 isclose  auto-orient.py:222
   │  │  ├─ 0.023 [self]  auto-orient.py
   │  │  └─ 0.018 dot  auto-orient.py:260
   │  ├─ 0.037 normalize  auto-orient.py:297
   │  │  ├─ 0.018 [self]  auto-orient.py
   │  │  └─ 0.012 <genexpr>  auto-orient.py:308
   │  ├─ 0.028 [self]  auto-orient.py
   │  └─ 0.025 <lambda>  auto-orient.py:205
   │     └─ 0.014 [self]  auto-orient.py
   └─ 0.034 read_named_set  auto-orient.py:158
      ├─ 0.017 <listcomp>  auto-orient.py:173
      └─ 0.012 <listcomp>  auto-orient.py:172

To view this report with different options, run:
   pyinstrument --load-prev 2024-07-14T09-05-37 [options]

It is apparent that reading the mesh data by the function read_allmsh takes most of the time. That function contains two inner functions, to_node and to_element that are responsible for most of the time used. This is because they are called many times inside list comprehensions.

def to_node(items):
    return (int(items[0]), tuple(float(j) for j in items[1:]))

nodes = dict([to_node(ln.strip().split(",")) for ln in lines[ns:ne]])

starts = (ln for ln in lines[es:] if ln.endswith(","))
ends = (ln for ln in lines[es:] if not ln.endswith(","))

def to_element(s, e):
    items = tuple(int(j) for j in s.split(",")[:-1]) + tuple(
        int(j) for j in e.split(",")
    )
    return (items[0], tuple(nodes[k] for k in items[1:]))

elements = dict([to_element(s, e) for s, e in zip(starts, ends)])

This obviously leeds to a lot of function calls. So I rewrote these by using generator expressions

# Read nodes
nstep1 = (ln.split(",") for ln in lines[ns:ne])
nodes = dict(
    (int(items[0]), tuple(float(j) for j in items[1:])) for items in nstep1
)
# Read C3D20(R) elements
starts = (ln for ln in lines[es:] if ln.endswith(","))
ends = (ln for ln in lines[es:] if not ln.endswith(","))
estep1 = ((s + e).split(",") for s, e in zip(starts, ends))
elements = dict(
    (int(items[0]), tuple(nodes[int(k)] for k in items[1:])) for items in estep1
)

After that change the profiling was repeated:

> python -m pyinstrument auto-orient.py -t top bot

_     ._   __/__   _ _  _  _ _/_   Recorded: 09:06:22  Samples:  903
/_//_/// /_\ / //_// / //_'/ //     Duration: 0.933     CPU time: 0.933
/   _/                      v4.6.2

Program: pyinstrument/__main__.py auto-orient.py -t top bot

0.930 <module>  auto-orient.py:1
└─ 0.918 main  auto-orient.py:24
   ├─ 0.718 read_allmsh  auto-orient.py:115
   │  ├─ 0.314 <genexpr>  auto-orient.py:141
   │  │  ├─ 0.156 [self]  auto-orient.py
   │  │  ├─ 0.104 <genexpr>  auto-orient.py:142
   │  │  └─ 0.054 <genexpr>  auto-orient.py:140
   │  │     ├─ 0.031 str.split  <built-in>
   │  │     └─ 0.023 [self]  auto-orient.py
   │  ├─ 0.263 <genexpr>  auto-orient.py:148
   │  │  ├─ 0.142 <genexpr>  auto-orient.py:149
   │  │  ├─ 0.061 [self]  auto-orient.py
   │  │  └─ 0.060 <genexpr>  auto-orient.py:147
   │  │     ├─ 0.017 <genexpr>  auto-orient.py:145
   │  │     │  └─ 0.011 [self]  auto-orient.py
   │  │     ├─ 0.016 str.split  <built-in>
   │  │     ├─ 0.015 [self]  auto-orient.py
   │  │     └─ 0.012 <genexpr>  auto-orient.py:146
   │  ├─ 0.058 <listcomp>  auto-orient.py:126
   │  │  ├─ 0.035 [self]  auto-orient.py
   │  │  └─ 0.023 str.startswith  <built-in>
   │  ├─ 0.047 <listcomp>  auto-orient.py:125
   │  │  ├─ 0.028 [self]  auto-orient.py
   │  │  └─ 0.019 str.strip  <built-in>
   │  └─ 0.034 [self]  auto-orient.py
   ├─ 0.155 set_normals  auto-orient.py:181
   │  ├─ 0.053 isclose  auto-orient.py:218
   │  │  ├─ 0.024 [self]  auto-orient.py
   │  │  └─ 0.023 dot  auto-orient.py:256
   │  ├─ 0.031 <lambda>  auto-orient.py:201
   │  │  ├─ 0.015 <genexpr>  auto-orient.py:201
   │  │  └─ 0.014 [self]  auto-orient.py
   │  ├─ 0.029 normalize  auto-orient.py:293
   │  │  ├─ 0.013 <genexpr>  auto-orient.py:304
   │  │  └─ 0.012 [self]  auto-orient.py
   │  └─ 0.023 [self]  auto-orient.py
   └─ 0.033 read_named_set  auto-orient.py:154
      ├─ 0.016 <listcomp>  auto-orient.py:169
      └─ 0.012 <listcomp>  auto-orient.py:168

This change improved the performance by 15%.

In an attempt to further improve, I ported the read_mesh routine from cgx.py (which basically does the same) to only return the elements, but that was slightly worse.

In [1]: cd /home/rsmith/src/progs/calculix-orient

In [2]: ao = __import__("auto-orient");

In [3]: elements = ao.read_allmsh("test/all.msh");

In [4]: %timeit elements = ao.read_allmsh("test/all.msh");
63.5 ms ± 55.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: %timeit elements = ao.read_mesh("test/all.msh");
66 ms ± 307 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

The read_mesh function iterates over the lines of the mesh file and functions as a state machine.

It is interesting to see that two completely different approaches yield almost the same time. Reading the data was only a small faction of the total time, so it’s not that this task is I/O bound.

For comments, please send me an e-mail.

← Profiling with pyinstrument Writing C in 2024 →

Roland's homepage

Profiling Python scripts(6): auto-orient

Related articles