Profiling with pyinstrument
After having used the built-in tracing profiler cProfile for a while now, I came across the statistical (sampling) profiler pyinstrument.
A statistical profiler does not influence the runtime of the program as much
as a tracing profiler as cProfile
does.
Additionally, pyinstrument
shows the result in a different way from cProfile
.
So in this article, I will be profiling my repotool
program.
This program helps me maintain a mirror of FreeBSD packages that I use.
Note
The other articles in this series will be listed under “Related Articles” at the bottom of the page.
The repotool
program works with the packagesite.yaml
that is retrieved
from the package site.
For ease of use, the contents of the YAML file are converted to a sqlite
database.
The database tables have the following definition:
CREATE TABLE packages (name TEXT, origin TEXT, version TEXT, comment TEXT, maintainer TEXT, www TEXT, abi TEXT, arch TEXT, prefix TEXT, sum TEXT, flatsize INT, path TEXT, repopath TEXT, licenselogic TEXT, pkgsize INT, desc TEXT); CREATE TABLE licenses (pkgid INT, name txt); CREATE TABLE deps (pkgid INT, name TEXT, origin TEXT, version TEXT, depid INT); CREATE TABLE categories (pkgid INT, name txt); CREATE TABLE shlibs_required (pkgid INT, name txt); CREATE TABLE shlibs_provided (pkgid INT, name txt); CREATE TABLE options (pkgid INT, key TEXT, value TEXT); CREATE TABLE annotations (pkgid INT, key TEXT, value TEXT);
The most important tables are packages
and deps
.
The latter records dependencies.
Note the these are rowid tables, so that each row has a unique non-NULL,
signed 64-bit integer row-id.
The reason I wrote repotool
is that I wanted to know what all
dependencies of a package are.
So that they can be downloaded as required.
The FreeBSD pkg
utility only displays the direct dependencies of
a package.
The repotool
program works fine, but looking up all dependencies of
a package is slow.
This is probably because I use a recursive function to look up the
dependencies.
Such a search is relatively easy to understand and to program, but it is not
necessarily efficient.
Here is the original function.
def deps(cur, name):
"""
Find all the dependencies of a package.
Arguments:
cur (Cursor): database cursor
name (str): name of the package
Returns: a list of rowids of dependencies.
"""
@functools.cache
def depbyrowid(cur, rowid):
if rowid == -1:
return set()
depids = cur.execute(
"SELECT depid FROM deps WHERE pkgid is ?", rowid
).fetchall()
if not depids:
return set()
rv = set(depids)
for di in depids:
rv |= depbyrowid(cur, di)
return rv
cur.execute("SELECT rowid FROM packages WHERE name IS ?", (name,))
pkgid = cur.fetchone()
alldeps = list(depbyrowid(cur, pkgid))
alldeps.insert(0, pkgid)
return alldeps
Note that I’ve already used functools.cache
to cache values.
Let’s profile this program with cProfile
:
python -m cProfile -s tottime repotool.py show paraview
The paraview
program is looked up in this example because it in the order
of 360 dependencies in total.
Skipping the program output, the most interesting part of the profile is:
Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 722 1.443 0.002 1.443 0.002 {method 'execute' of 'sqlite3.Cursor' objects} 360 0.679 0.002 0.679 0.002 {method 'fetchall' of 'sqlite3.Cursor' objects} 362 0.018 0.000 0.018 0.000 {method 'fetchone' of 'sqlite3.Cursor' objects} 360/1 0.003 0.000 2.099 2.099 repotool-3d2d1b9.py:423(depbyrowid) 472 0.002 0.000 0.002 0.000 {built-in method posix.stat} 14 0.001 0.000 0.001 0.000 {method 'read' of '_io.BufferedReader' objects} 7 0.001 0.000 0.001 0.000 {built-in method _imp.create_dynamic} 13 0.001 0.000 0.001 0.000 {built-in method marshal.loads} 66 0.000 0.000 0.002 0.000 <frozen importlib._bootstrap_external>:1514(find_spec) 1 0.000 0.000 2.144 2.144 repotool-3d2d1b9.py:162(cmd_show)
The total profile is hundreds of lines long. However, everything below this in the profile hardly registers w.r.t. time spent so it can be safely ignored.
We can also ignore the method calls of methods of the sqlite3.Cursor
;
those are part of the standard library and I’m limiting the scope of this
investigation to my own code. :-)
It is clear that deprowbyid
uses the most time.
Now for comparison, let’s use pyinstrument
:
python -m pyinstrument repotool.py show paraview
The result:
_ ._ __/__ _ _ _ _ _/_ Recorded: 18:20:31 Samples: 550 /_//_/// /_\ / //_// / //_'/ // Duration: 2.130 CPU time: 2.129 / _/ v4.6.1 Program: /usr/local/lib/python3.9/site-packages/ \ pyinstrument-4.6.1-py3.9-freebsd-14.0-RELEASE-p3-amd64.egg/pyinstrument/__main__.py \ repotool.py show paraview 2.126 <module> repotool.py:10 └─ 2.124 main repotool.py:53 └─ 2.122 cmd_show repotool.py:162 └─ 2.105 deps repotool.py:412 └─ 2.095 depbyrowid repotool.py:423 └─ 2.089 depbyrowid repotool.py:423 ├─ 1.906 depbyrowid repotool.py:423 │ ├─ 1.236 depbyrowid repotool.py:423 │ │ ├─ 0.675 depbyrowid repotool.py:423 │ │ │ ├─ 0.297 depbyrowid repotool.py:423 │ │ │ │ ├─ 0.127 Cursor.execute <built-in> │ │ │ │ ├─ 0.111 depbyrowid repotool.py:423 │ │ │ │ │ ├─ 0.062 Cursor.execute <built-in> │ │ │ │ │ ├─ 0.026 Cursor.fetchall <built-in> │ │ │ │ │ └─ 0.024 depbyrowid repotool.py:423 │ │ │ │ └─ 0.059 Cursor.fetchall <built-in> │ │ │ ├─ 0.252 Cursor.execute <built-in> │ │ │ └─ 0.126 Cursor.fetchall <built-in> │ │ ├─ 0.393 Cursor.execute <built-in> │ │ └─ 0.168 Cursor.fetchall <built-in> │ ├─ 0.452 Cursor.execute <built-in> │ └─ 0.218 Cursor.fetchall <built-in> ├─ 0.118 Cursor.execute <built-in> └─ 0.065 Cursor.fetchall <built-in>
In the terminal, the output is colored which helps with interpreting it.
Even without colors, it is clear that the recursive call to deprowbyid
takes the most time.
The output is a lot more consise then that of cProfile
.
That makes it much easier to read.
It is not uncommon for a program running under cProfile
to slow down
significantly.
In this case however, the runtime of the program under the profilers isn’t
much different from each other and from the time it takes without a profiler.
To speed up the operation I did two things.
- Use the database more efficiently by having it return the dependencies of multiple packages at once.
- Replace recursion with a while loop. Keep running the query as long as new packages are found.
The new code looks like this:
def deps(cur, name):
"""
Find all the dependencies of a package.
Arguments:
cur (Cursor): database cursor
name (str): name of the package
Returns: a list of rowids of dependencies.
"""
cur.execute("SELECT rowid FROM packages WHERE name IS ?", (name,))
pkgid = cur.fetchone()
alldeps = []
newdeps = [pkgid]
while len(newdeps) >= len(alldeps):
alldeps += [j for j in newdeps if j not in alldeps]
newdeps = cur.execute(
"SELECT DISTINCT depid FROM deps WHERE pkgid IN "
+ "("
+ ", ".join(str(j[0]) for j in alldeps)
+ ")"
).fetchall()
rv = [pkgid]
rv += [j for j in alldeps if j not in rv]
return rv
When this code is profiled, the result is:
_ ._ __/__ _ _ _ _ _/_ Recorded: 19:01:20 Samples: 29 /_//_/// /_\ / //_// / //_'/ // Duration: 0.161 CPU time: 0.160 / _/ v4.6.1 Program: /usr/local/lib/python3.9/site-packages/\ pyinstrument-4.6.1-py3.9-freebsd-14.0-RELEASE-p3-amd64.egg/pyinstrument/__main__.py \ ../repotool.py show paraview 0.158 <module> repotool.py:10 ├─ 0.156 main repotool.py:52 │ ├─ 0.153 cmd_show repotool.py:161 │ │ ├─ 0.126 deps repotool.py:411 │ │ │ ├─ 0.107 Cursor.fetchall <built-in> │ │ │ ├─ 0.009 Cursor.execute <built-in> │ │ │ ├─ 0.005 Cursor.fetchone <built-in> │ │ │ └─ 0.005 <listcomp> repotool.py:426 │ │ ├─ 0.011 Cursor.fetchone <built-in> │ │ ├─ 0.010 Cursor.execute <built-in> │ │ ├─ 0.003 <listcomp> repotool.py:184 │ │ │ └─ 0.002 Cursor.execute <built-in> │ │ └─ 0.002 exists genericpath.py:16 │ │ [2 frames hidden] genericpath, <built-in> │ └─ 0.003 check_running repotool.py:462 │ └─ 0.003 check_output subprocess.py:377 │ [2 frames hidden] subprocess └─ 0.002 <module> sqlite3/__init__.py:23 [2 frames hidden] sqlite3
The total runtime of the program went from a sluggish 2 s to an almost instantaneous 0.15 s. That is an impressive improvement.
Now, approximately 95% of the time spent in deps
or cmd_show
is in
methods of the sqlite3.Cursor
object, which is not in scope.
So there is little left to improve here.
And beyond that, the current reaction speed is certainly sufficient.
For comments, please send me an e-mail.
Related articles
- Profiling Python scripts(6): auto-orient
- Profiling Python scripts (5): ent_without_numpy
- Profiling Python scripts (4): vecops.indexate
- Profiling Python scripts (3): stl2ps
- Profiling Python scripts (2): stlinfo