Roland's homepage

My random knot in the Web

Profiling Python scripts (2): stlinfo

This is the second in a series of articles that covers analyzing and improving performance bottlenecks in Python scripts. In this second article the performance of stlinfo is looked at.

Note

The other articles in this series will be listed under “Related Articles” at the bottom of the page.

Profiling stlinfo

The script is profiled as follows to get a baseline for its performance:

python -m cProfile \
-s tottime stlinfo.py \
../testdata-stltools/headrest_mesh.stl| less

The relevant output for a total time >0.01 s is:

     1678973 function calls (1678577 primitive calls) in 0.836 seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
832945    0.307    0.000    0.466    0.000 stl.py:82(_getbp)
     1    0.096    0.096    0.562    0.562 stl.py:78(<listcomp>)
277649    0.094    0.000    0.094    0.000 {built-in method _struct.unpack}
     1    0.052    0.052    0.836    0.836 stlinfo.py:9(<module>)
   184    0.048    0.000    0.048    0.000 {built-in method builtins.min}
    65    0.047    0.001    0.047    0.001 {built-in method builtins.max}
277650    0.041    0.000    0.041    0.000 {method 'read' of 'mmap.mmap' objects}
     1    0.033    0.033    0.033    0.033 bbox.py:25(<listcomp>)
     1    0.032    0.032    0.032    0.032 bbox.py:22(<listcomp>)
     1    0.032    0.032    0.032    0.032 bbox.py:21(<listcomp>)
278476/278398    0.023    0.000    0.023    0.000 {built-in method builtins.len}
     1    0.016    0.016    0.773    0.773 stlinfo.py:23(main)

It is clear that the function _getbp from stltools/stl.py dominates the time spent. The next two items are connected to it;

  • _getbp is called in stl.py:78(<listcomp>)
  • _struct.unpack is a major part of _getbp as shown below.

Investigating _getbp

The function in question looks like this.

def _getbp(m):
    """
    Generate points from a binary STL file.

    Arguments:
        m: A memory mapped file.

    Yields:
        The vertices as 3-tuple of floats.
    """
    while True:
        v = m.read(50)
        if len(v) != 50:
            break
        p = struct.unpack("<12x9f2x", v)
        yield tuple(p[0:3])
        yield tuple(p[3:6])
        yield tuple(p[6:])

Most of the actual work is being done by struct.unpack. Looking at the source code for struct.py, this is already implemented in C. So trying to improve on that by writing a custom converter is not the first point of concern.

Instead of reading in small pieces from the memory mapped file, let’s read it all and use iter_unpack.

def _getbp(m):
    """
    Generate points from a binary STL file.

    Arguments:
        m: A memory mapped file.

    Yields:
        The vertices as 3-tuple of floats.
    """
    for p in struct.iter_unpack("<12x9f2x", m.read()):
        yield tuple(p[0:3])
        yield tuple(p[3:6])
        yield tuple(p[6:])

Profiling again:

      846029 function calls (845633 primitive calls) in 0.629 seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
832945    0.257    0.000    0.263    0.000 stl.py:82(_getbp)
     1    0.092    0.092    0.355    0.355 stl.py:78(<listcomp>)
     1    0.052    0.052    0.629    0.629 stlinfo.py:9(<module>)
   184    0.048    0.000    0.048    0.000 {built-in method builtins.min}
    65    0.047    0.001    0.047    0.001 {built-in method builtins.max}
     1    0.033    0.033    0.033    0.033 bbox.py:22(<listcomp>)
     1    0.032    0.032    0.032    0.032 bbox.py:25(<listcomp>)
     1    0.032    0.032    0.032    0.032 bbox.py:21(<listcomp>)
     1    0.016    0.016    0.566    0.566 stlinfo.py:23(main)

This has shaved about 25% off the total run time, and 16% off the run time of _getbp. That is a quick win.

This did cause an error when the size of the buffer was not a multiple of the struct size. To prevent that the code was changed to:

def _getbp(m):
    """
    Generate points from a binary STL file.

    Arguments:
        m: A memory mapped file.

    Yields:
        The vertices as 3-tuple of floats.
    """
    fmt = "<12x9f2x"
    sz = struct.calcsize(fmt)
    buffer = m.read()
    count = len(buffer) // sz * sz
    buffer = buffer[:count]
    for p in struct.iter_unpack(fmt, buffer):
        yield tuple(p[0:3])
        yield tuple(p[3:6])
        yield tuple(p[6:])

This did not have a significant effect on performance.

Lessons learned

  • If possible, load data in memory all at once rather than in small chunks.
  • Tests are worthwhile to validate changes.

For comments, please send me an e-mail.


Related articles


←  Profiling Python scripts (1): stl2pov Profiling Python scripts (3): stl2ps  →