Profiling with pyinstrument

After having used the built-in tracing profiler cProfile for a while now, I came across the statistical (sampling) profiler pyinstrument.

A statistical profiler does not influence the runtime of the program as much as a tracing profiler as cProfile does. Additionally, pyinstrument shows the result in a different way from cProfile.

So in this article, I will be profiling my repotool program. This program helps me maintain a mirror of FreeBSD packages that I use.

Note

The other articles in this series will be listed under “Related Articles” at the bottom of the page.

The repotool program works with the packagesite.yaml that is retrieved from the package site. For ease of use, the contents of the YAML file are converted to a sqlite database.

The database tables have the following definition:

CREATE TABLE packages (name TEXT, origin TEXT,
version TEXT, comment TEXT, maintainer TEXT, www TEXT, abi TEXT, arch TEXT,
prefix TEXT, sum TEXT, flatsize INT, path TEXT, repopath TEXT,
licenselogic TEXT, pkgsize INT, desc TEXT);
CREATE TABLE licenses (pkgid INT, name txt);
CREATE TABLE deps (pkgid INT, name TEXT, origin TEXT, version TEXT, depid INT);
CREATE TABLE categories (pkgid INT, name txt);
CREATE TABLE shlibs_required (pkgid INT, name txt);
CREATE TABLE shlibs_provided (pkgid INT, name txt);
CREATE TABLE options (pkgid INT, key TEXT, value TEXT);
CREATE TABLE annotations (pkgid INT, key TEXT, value TEXT);

The most important tables are packages and deps. The latter records dependencies. Note the these are rowid tables, so that each row has a unique non-NULL, signed 64-bit integer row-id.

The reason I wrote repotool is that I wanted to know what all dependencies of a package are. So that they can be downloaded as required. The FreeBSD pkg utility only displays the direct dependencies of a package.

The repotool program works fine, but looking up all dependencies of a package is slow. This is probably because I use a recursive function to look up the dependencies. Such a search is relatively easy to understand and to program, but it is not necessarily efficient. Here is the original function.

def deps(cur, name):
    """
    Find all the dependencies of a package.

    Arguments:
        cur (Cursor): database cursor
        name (str): name of the package

    Returns: a list of rowids of dependencies.
    """

    @functools.cache
    def depbyrowid(cur, rowid):
        if rowid == -1:
            return set()
        depids = cur.execute(
            "SELECT depid FROM deps WHERE pkgid is ?", rowid
        ).fetchall()
        if not depids:
            return set()
        rv = set(depids)
        for di in depids:
            rv |= depbyrowid(cur, di)
        return rv

    cur.execute("SELECT rowid FROM packages WHERE name IS ?", (name,))
    pkgid = cur.fetchone()
    alldeps = list(depbyrowid(cur, pkgid))
    alldeps.insert(0, pkgid)
    return alldeps

Note that I’ve already used functools.cache to cache values. Let’s profile this program with cProfile:

python -m cProfile -s tottime repotool.py show paraview

The paraview program is looked up in this example because it in the order of 360 dependencies in total. Skipping the program output, the most interesting part of the profile is:

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   722    1.443    0.002    1.443    0.002 {method 'execute' of 'sqlite3.Cursor' objects}
   360    0.679    0.002    0.679    0.002 {method 'fetchall' of 'sqlite3.Cursor' objects}
   362    0.018    0.000    0.018    0.000 {method 'fetchone' of 'sqlite3.Cursor' objects}
 360/1    0.003    0.000    2.099    2.099 repotool-3d2d1b9.py:423(depbyrowid)
   472    0.002    0.000    0.002    0.000 {built-in method posix.stat}
    14    0.001    0.000    0.001    0.000 {method 'read' of '_io.BufferedReader' objects}
     7    0.001    0.000    0.001    0.000 {built-in method _imp.create_dynamic}
    13    0.001    0.000    0.001    0.000 {built-in method marshal.loads}
    66    0.000    0.000    0.002    0.000 <frozen importlib._bootstrap_external>:1514(find_spec)
     1    0.000    0.000    2.144    2.144 repotool-3d2d1b9.py:162(cmd_show)

The total profile is hundreds of lines long. However, everything below this in the profile hardly registers w.r.t. time spent so it can be safely ignored.

We can also ignore the method calls of methods of the sqlite3.Cursor; those are part of the standard library and I’m limiting the scope of this investigation to my own code. :-)

It is clear that deprowbyid uses the most time. Now for comparison, let’s use pyinstrument:

python -m pyinstrument repotool.py show paraview

The result:

  _     ._   __/__   _ _  _  _ _/_   Recorded: 18:20:31  Samples:  550
 /_//_/// /_\ / //_// / //_'/ //     Duration: 2.130     CPU time: 2.129
/   _/                      v4.6.1

Program: /usr/local/lib/python3.9/site-packages/ \
pyinstrument-4.6.1-py3.9-freebsd-14.0-RELEASE-p3-amd64.egg/pyinstrument/__main__.py \
repotool.py show paraview

2.126 <module>  repotool.py:10
└─ 2.124 main  repotool.py:53
└─ 2.122 cmd_show  repotool.py:162
    └─ 2.105 deps  repotool.py:412
        └─ 2.095 depbyrowid  repotool.py:423
            └─ 2.089 depbyrowid  repotool.py:423
            ├─ 1.906 depbyrowid  repotool.py:423
            │  ├─ 1.236 depbyrowid  repotool.py:423
            │  │  ├─ 0.675 depbyrowid  repotool.py:423
            │  │  │  ├─ 0.297 depbyrowid  repotool.py:423
            │  │  │  │  ├─ 0.127 Cursor.execute  <built-in>
            │  │  │  │  ├─ 0.111 depbyrowid  repotool.py:423
            │  │  │  │  │  ├─ 0.062 Cursor.execute  <built-in>
            │  │  │  │  │  ├─ 0.026 Cursor.fetchall  <built-in>
            │  │  │  │  │  └─ 0.024 depbyrowid  repotool.py:423
            │  │  │  │  └─ 0.059 Cursor.fetchall  <built-in>
            │  │  │  ├─ 0.252 Cursor.execute  <built-in>
            │  │  │  └─ 0.126 Cursor.fetchall  <built-in>
            │  │  ├─ 0.393 Cursor.execute  <built-in>
            │  │  └─ 0.168 Cursor.fetchall  <built-in>
            │  ├─ 0.452 Cursor.execute  <built-in>
            │  └─ 0.218 Cursor.fetchall  <built-in>
            ├─ 0.118 Cursor.execute  <built-in>
            └─ 0.065 Cursor.fetchall  <built-in>

In the terminal, the output is colored which helps with interpreting it. Even without colors, it is clear that the recursive call to deprowbyid takes the most time.

The output is a lot more consise then that of cProfile. That makes it much easier to read.

It is not uncommon for a program running under cProfile to slow down significantly. In this case however, the runtime of the program under the profilers isn’t much different from each other and from the time it takes without a profiler.

To speed up the operation I did two things.

Use the database more efficiently by having it return the dependencies of multiple packages at once.
Replace recursion with a while loop. Keep running the query as long as new packages are found.

The new code looks like this:

def deps(cur, name):
    """
    Find all the dependencies of a package.

    Arguments:
        cur (Cursor): database cursor
        name (str): name of the package

    Returns: a list of rowids of dependencies.
    """
    cur.execute("SELECT rowid FROM packages WHERE name IS ?", (name,))
    pkgid = cur.fetchone()
    alldeps = []
    newdeps = [pkgid]
    while len(newdeps) >= len(alldeps):
        alldeps += [j for j in newdeps if j not in alldeps]
        newdeps = cur.execute(
            "SELECT DISTINCT depid FROM deps WHERE pkgid IN "
            + "("
            + ", ".join(str(j[0]) for j in alldeps)
            + ")"
        ).fetchall()
    rv = [pkgid]
    rv += [j for j in alldeps if j not in rv]
    return rv

When this code is profiled, the result is:

_     ._   __/__   _ _  _  _ _/_   Recorded: 19:01:20  Samples:  29
/_//_/// /_\ / //_// / //_'/ //     Duration: 0.161     CPU time: 0.160
/   _/                      v4.6.1

Program: /usr/local/lib/python3.9/site-packages/\
pyinstrument-4.6.1-py3.9-freebsd-14.0-RELEASE-p3-amd64.egg/pyinstrument/__main__.py \
../repotool.py show paraview

0.158 <module>  repotool.py:10
├─ 0.156 main  repotool.py:52
│  ├─ 0.153 cmd_show  repotool.py:161
│  │  ├─ 0.126 deps  repotool.py:411
│  │  │  ├─ 0.107 Cursor.fetchall  <built-in>
│  │  │  ├─ 0.009 Cursor.execute  <built-in>
│  │  │  ├─ 0.005 Cursor.fetchone  <built-in>
│  │  │  └─ 0.005 <listcomp>  repotool.py:426
│  │  ├─ 0.011 Cursor.fetchone  <built-in>
│  │  ├─ 0.010 Cursor.execute  <built-in>
│  │  ├─ 0.003 <listcomp>  repotool.py:184
│  │  │  └─ 0.002 Cursor.execute  <built-in>
│  │  └─ 0.002 exists  genericpath.py:16
│  │        [2 frames hidden]  genericpath, <built-in>
│  └─ 0.003 check_running  repotool.py:462
│     └─ 0.003 check_output  subprocess.py:377
│           [2 frames hidden]  subprocess
└─ 0.002 <module>  sqlite3/__init__.py:23
    [2 frames hidden]  sqlite3

The total runtime of the program went from a sluggish 2 s to an almost instantaneous 0.15 s. That is an impressive improvement.

Now, approximately 95% of the time spent in deps or cmd_show is in methods of the sqlite3.Cursor object, which is not in scope. So there is little left to improve here. And beyond that, the current reaction speed is certainly sufficient.

For comments, please send me an e-mail.

← From python script to executable with cython

Roland's homepage

Profiling with pyinstrument

Related articles