Roland's homepage

My random knot in the Web

Attempting a conky replacement in Python (part 2)

In part 1 we say that a simple replacement for conky for generating a statusline for i3 can be achieved. But since it uses the subprocess module to call external programs it is pretty CPU intensive.

The question now is if we can reduce that? For that we’re going to use mmap to look at the mailbox, and call sysctlbyname(3) using ctypes to get the remaining system information. Note that sysctl et al and the names used are specific to FreeBSD.

Using mmap to scan the mailbox

Instead of reading a complete file into memory, we can use mmap to map the file into memory. This can be done as follows.

In the “mbox” format, the start of a new mail message is a blank line followed by “From ”. Except at the beginning of the file where the first message starts with a line starting with “From ”. To solve this we search for “\n\nFrom ” and simply add 1 to the number of instances found.

In [1]: import os

In [2]: import mmap

In [3]: def mail(previous):
...:     """
...:     Report unread mail.
...:
...:     Arguments:
...:         previous: a 2-tuple (unread, time) from the previous call, or None
...:
...:     Returns: A new 2-tuple (unread, time) and a string to display.
...:     """
...:     mboxname = '/home/rsmith/Mail/received'
...:     newtime = os.stat(mboxname).st_mtime
...:     if previous is None or newtime > previous[1]:
...:         with open(mboxname) as mbox:
...:             with mmap.mmap(mbox.fileno(), 0, prot=mmap.PROT_READ) as mm:
...:                 start, total = 0, 1
...:                 while True:
...:                     rv = mm.find(b'\n\nFrom ', start)
...:                     if rv == -1:
...:                         break
...:                     else:
...:                         total += 1
...:                         start = rv+7
...:                 start, read = 0, 0
...:                 while True:
...:                     rv = mm.find(b'\nStatus: R', start)
...:                     if rv == -1:
...:                         break
...:                     else:
...:                         read += 1
...:                         start = rv+10
...:         unread = total - read
...:         newdata = (unread, newtime)
...:     else:
...:         unread = previous[0]
...:         newdata = previous
...:     return newdata, f'Mail: {unread}'
...:

In [4]: %timeit mail(None)
78.4 ms ± 42.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: mail(None)
Out[5]: ((43, 1562166977.0), 'Mail: 43')

In [8]: p, _ = mail(None)

In [9]: %timeit mail(p)
14.5 µs ± 31.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The most common result is that the value from the global variable is returned. This is extremely fast. Doing the real scan will still take around 80 ms but that is about 4× faster than just reading the file.

Ctypes to the rescue

Via the built-in ctypes module we can call functions in libraries. (There are alternatives like e.g. cffi, but for now I’ve used what was built-in.)

In FreeBSD, sysctlbyname(3) can be found in libc. It can be used as follows.

 In [1]: import ctypes

 In [2]: import ctypes.util

 In [3]: libc = ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True)
 Out[3]: <CDLL 'libc.so.7', handle 80062c000 at 0x809cfde80>

 In [4]: def to_degC(value):
 ...:     """Convert binary sysctl value to degree Centigrade."""
 ...:     return round(int.from_bytes(value, byteorder='little')/10 - 273.15, 1)
 ...:

 In [5]: def sysctlbyname(name, buflen=4, convert=None):
 ...:     """
 ...:     Python wrapper for sysctlbyname(3) on FreeBSD.
 ...:
 ...:     Arguments:
 ...:         name (str): Name of the sysctl to query
 ...:         buflen (int): Length of the data buffer to use.
 ...:         convert: Optional function to convert the data.
 ...:
 ...:     Returns:
 ...:         The requested binary data, converted if desired.
 ...:     """
 ...:     name_in = ctypes.c_char_p(bytes(name, encoding='ascii'))
 ...:     oldlen = ctypes.c_size_t(buflen)
 ...:     oldlenp = ctypes.byref(oldlen)
 ...:     oldp = ctypes.create_string_buffer(buflen)
 ...:     rv = libc.sysctlbyname(name_in, oldp, oldlenp, None, ctypes.c_size_t(0))
 ...:     if rv != 0:
...:          errno = ctypes.get_errno()
 ...:         raise ValueError(f'sysctlbyname error: {errno}')
 ...:     if convert:
 ...:         return convert(oldp.raw[:buflen])
 ...:     return oldp.raw[:buflen]
 ...:

 In [6]: sysctlbyname('dev.cpu.0.temperature', convert=to_degC)
 Out[6]: 51.0

 In [7]: %timeit sysctlbyname('dev.cpu.0.temperature', convert=to_degC)
 25.8 µs ± 56.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

As can be seen, this is very fast. Where a subprocess call is around 80 ms, this is around 25 µs. That’s 3200x faster! The downside is that these calls return binary data.

From calling sysctl -t dev.cpu.0.temperature we know that this sysctl returns an integer. Knowing that FreeBSD on amd64 is a little-endian system, we can convert the value. In the source code of the coretemp driver which produces this sysctl, we see a format code “IK” for this integer. According to sysctl(9) this means the value is in deciKelvins, which we can easily convert to centigrade. This is what to_degC() does.

Memory

Below, the memory() function is re-implemented by using sysctlbyname(3).

In [5]: def to_int(value):
...:     """Convert binary sysctl value to integer."""
...:     return int.from_bytes(value, byteorder='little')
...:

In [6]: def memory():
...:     """
...:     Report on the RAM usage on FreeBSD.
...:
...:     Returns: a formatted string to display.
...:     """
...:     suffixes = ('page_count', 'free_count', 'inactive_count', 'cache_count')
...:     stats = {
...:         suffix: sysctlbyname(f'vm.stats.vm.v_{suffix}', convert=to_int)
...:         for suffix in suffixes
...:     }
...:     memmax = stats['page_count']
...:     mem = (memmax - stats['free_count'] - stats['inactive_count'] - stats['cache_count'])
...:     free = int(100 * mem / memmax)
...:     return f'RAM: {free}%'
...:

In [7]: memory()
Out[7]: 'RAM: 18%'

In [8]: %timeit memory()
90.8 µs ± 182 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

This re-implementation yields a speed improvement of about a 1000 times compared to calling sysctl(1) in a subprocess!

Network

Looking at the py-freebsd module, I found out that network interface statistics can be gathered by reading “opaque” sysctls under net.link.generic. We can use net.link.generic.system.ifcount to get the number of configured interfaces. Let’s look at this first.

In [21]: sysctlbyname('net.link.generic.system.ifcount', convert=to_int)
Out[21]: 3

That matches the information from ifconfig.

> ifconfig -l
age0 rl0 lo0

The interface data is available from nodes under net.link.generic.ifdata. Looking at the code, it seems that these nodes do not have human-readable names. So we have to implement and use a wrapper for plain sysctl, where the “name” is merely an array of integers.

def sysctl(name, buflen=4, convert=None):
    """
    Python wrapper for sysctl(3) on FreeBSD.

    Arguments:
        name: list or tuple of integers.
        buflen (int): Length of the data buffer to use.
        convert: Optional function to convert the data.

    Returns:
        The requested binary data, converted if desired.
    """
    cnt = len(name)
    mib = ctypes.c_int * cnt
    name_in = mib(*name)
    oldlen = ctypes.c_size_t(buflen)
    oldlenp = ctypes.byref(oldlen)
    oldp = ctypes.create_string_buffer(buflen)
    rv = libc.sysctl(
        ctypes.byref(name_in), ctypes.c_uint(cnt), oldp, oldlenp, None, ctypes.c_size_t(0)
    )
    if rv != 0:
        errno = ctypes.get_errno()
        raise ValueError(f'sysctl error: {errno}')
    if convert:
        return convert(oldp.raw[:buflen])
    return oldp.raw[:buflen]

To get the network device statistics, our array name should be:

CTL_NET, PF_LINK, NETLINK_GENERIC, IFMIB_IFDATA, X, IFDATA_GENERAL

Searching through the FreeBSD header files yields:

  • CTL_NET = 4
  • PF_LINK = 18
  • NETLINK_GENERIC = 0
  • IFMIB_IFDATA = 2
  • X is the number of the interface in this case.
  • IFDATA_GENERAL = 1

Note that using interface number 0 does not work.

According to py_freebsd, this sysctl should return a struct ifmibdata. The definition of this structure is found in /usr/include/net/if_mib.h, While the struct if_data is found in /usr/include/net/if.h. The interface name string is 16 bytes long.

struct ifmibdata {
        char    ifmd_name[IFNAMSIZ]; /* name of interface */
        int     ifmd_pcount;    /* number of promiscuous listeners */
        int     ifmd_flags;     /* interface flags */
        int     ifmd_snd_len;   /* instantaneous length of send queue */
        int     ifmd_snd_maxlen; /* maximum length of send queue */
        int     ifmd_snd_drops; /* number of drops in send queue */
        int     ifmd_filler[4]; /* for future expansion */
        struct  if_data ifmd_data; /* generic information and statistics */
};

struct if_data {
        /* generic interface information */
        uint8_t ifi_type;               /* ethernet, tokenring, etc */
        uint8_t ifi_physical;           /* e.g., AUI, Thinnet, 10base-T, etc */
        uint8_t ifi_addrlen;            /* media address length */
        uint8_t ifi_hdrlen;             /* media header length */
        uint8_t ifi_link_state;         /* current link state */
        uint8_t ifi_vhid;               /* carp vhid */
        uint16_t        ifi_datalen;    /* length of this data struct */
        uint32_t        ifi_mtu;        /* maximum transmission unit */
        uint32_t        ifi_metric;     /* routing metric (external only) */
        uint64_t        ifi_baudrate;   /* linespeed */
        /* volatile statistics */
        uint64_t        ifi_ipackets;   /* packets received on interface */
        uint64_t        ifi_ierrors;    /* input errors on interface */
        uint64_t        ifi_opackets;   /* packets sent on interface */
        uint64_t        ifi_oerrors;    /* output errors on interface */
        uint64_t        ifi_collisions; /* collisions on csma interfaces */
        uint64_t        ifi_ibytes;     /* total number of octets received */
        uint64_t        ifi_obytes;     /* total number of octets sent */
        uint64_t        ifi_imcasts;    /* packets received via multicast */
        uint64_t        ifi_omcasts;    /* packets sent via multicast */
        uint64_t        ifi_iqdrops;    /* dropped on input */
        uint64_t        ifi_oqdrops;    /* dropped on output */
        uint64_t        ifi_noproto;    /* destined for unsupported protocol */
        uint64_t        ifi_hwassist;   /* HW offload capabilities, see IFCAP */

        /* Unions are here to make sizes MI. */
        union {                         /* uptime at attach or stat reset */
                time_t          tt;
                uint64_t        ph;
        } __ifi_epoch;
#define ifi_epoch       __ifi_epoch.tt
        union {                         /* time of last administrative change */
                struct timeval  tv;
                struct {
                        uint64_t ph1;
                        uint64_t ph2;
                } ph;
        } __ifi_lastchange;
#define ifi_lastchange  __ifi_lastchange.tv
};

I wrote a trivial C program to determine the size of these structures on my platform, and the offsets of the things I want to know; ifi_ibytes and ifi_obytes:

> cc -o structinfo structinfo.c
> ./structinfo
sizeof(struct ifmibdata) == 208
sizeof(struct if_data) == 152
offsetof(ifmd_data) == 56
offsetof(ifi_ibytes) == 64
offsetof(ifi_obytes) == 72

In [1]: 56+64
Out[1]: 120

In [2]: 56+72
Out[2]: 128

In [3]: 128+8
Out[3]: 136

Let’s see if that works.

In [11]: def network(previous):
...:     """
...:     Report on bytes in/out for the network interfaces.
...:
...:     Arguments:
...:         previous: A dict of {interface: (inbytes, outbytes, time)} or None.
...:
...:     Returns:
...:         A new dict of {interface: (inbytes, outbytes, time)}, and a formatted
...:         string to display.
...:     """
...:     cnt = sysctlbyname('net.link.generic.system.ifcount', convert=to_int)
...:     newdata = {}
...:     items = []
...:     for n in range(1, cnt):
...:         tm = time.time()
...:         data = sysctl([4, 18, 0, 2, n, 1], buflen=208)
...:         name = data[:16].strip(b'\x00').decode('ascii')
...:         ibytes = to_int(data[120:128])
...:         obytes = to_int(data[128:136])
...:         if previous:
...:             dt = tm - previous[name][2]
...:             d_in = int((ibytes - previous[name][0])/dt)
...:             d_out = int((obytes - previous[name][1])/dt)
...:             items.append(f'{name}: {d_in}B/{d_out}B')
...:         else:
...:             items.append(f'{name}: 0B/0B')
...:         newdata[name] = (ibytes, obytes, tm)
...:     return newdata, '  '.join(items)
...:

In [12]: p, _ = network(None)

In [13]: %timeit network(p)
86.6 µs ± 202 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

CPU

The CPU function displays the workload from 0—100% and the average temperature of the cores. This is also re-implemented via sysctlbyname. The method for calculating the CPU percentage was inspired by conky.

In [14]: import statistics as stat

In [15]: import struct

In [16]: def to_degC(value):
    ...:     """Convert binary sysctl value to degree Centigrade."""
    ...:     return round(int.from_bytes(value, byteorder='little')/10 - 273.15, 1)
    ...:

In [17]: def cpu(previous):
    ...:     """
    ...:     Report the CPU usage and temperature.
    ...:
    ...:     Argument:
    ...:         previous: A 2-tuple (used, total) from the previous run.
    ...:
    ...:     Returns:
    ...:         A 2-tuple (used, total) and a string to display.
    ...:     """
    ...:     temps = [
    ...:         sysctlbyname(f'dev.cpu.{n}.temperature', convert=to_degC)
    ...:         for n in range(4)
    ...:     ]
    ...:     T = round(stat.mean(temps))
    ...:     resbuf = sysctlbyname('kern.cp_time', buflen=40)
    ...:     states = struct.unpack('5L', resbuf)
    ...:     # According to /usr/include/sys/resource.h, these are:
    ...:     # USER, NICE, SYS, INT, IDLE
    ...:     total = sum(states)
    ...:     used = total - states[-1]
    ...:     if previous:
    ...:         prevused, prevtotal = previous
    ...:         frac = int((used - prevused) / (total - prevtotal) * 100)
    ...:     else:
    ...:         frac = 0
    ...:     return (used, total), f'CPU: {frac}%, {T}°C'
    ...:

In [18]: p, _ = cpu(None)

In [19]: p, cpustr = cpu(p)

In [20]: p
Out[20]: (5810156, 193222344)

In [21]: cpustr
Out[21]: 'CPU: 0%, 53°C'

In [22]: %timeit cpu(p)
248 µs ± 658 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Given that this function performs three sysctl calls, the time used is acceptable, I think.

Putting it all together

The main function of the program looks like this.

def main():
    """
    Entry point for statusline-i3.py
    """
    netdata, _ = network(None)
    maildata, _ = mail(None)
    cpusage, _ = cpu(None)
    time.sleep(0.1)  # Lest we get divide by zero in cpu()
    while True:
        start = time.time()
        netdata, netstr = network(netdata)
        maildata, mailstr = mail(maildata)
        cpusage, cpustr = cpu(cpusage)
        print(' | '.join([netstr, mailstr, memory(), cpustr, date()]))
        sys.stdout.flush()  # Important to make the line show up!
        end = time.time()
        delta = end - start
        print(f"DEBUG: cycle time = {delta:.3f} s")
        if delta < 1:
            time.sleep(1 - delta)

Let’s see it run.

> python3 statusline-i3.py
age0: 0B/0B  rl0: 0B/0B | Mail: 43 | RAM: 19% | CPU: 0%, 52°C | Wed 2019-07-03 22:53:11
DEBUG: cycle time = 0.001 s
age0: 0B/0B  rl0: 0B/0B | Mail: 43 | RAM: 19% | CPU: 0%, 52°C | Wed 2019-07-03 22:53:12
DEBUG: cycle time = 0.001 s
age0: 0B/0B  rl0: 0B/0B | Mail: 43 | RAM: 19% | CPU: 0%, 52°C | Wed 2019-07-03 22:53:13
DEBUG: cycle time = 0.001 s
age0: 0B/0B  rl0: 0B/0B | Mail: 43 | RAM: 20% | CPU: 28%, 55°C | Wed 2019-07-03 22:53:14
DEBUG: cycle time = 0.001 s
age0: 1285B/772B  rl0: 0B/0B | Mail: 43 | RAM: 22% | CPU: 49%, 58°C | Wed 2019-07-03 22:53:15
DEBUG: cycle time = 0.001 s
age0: 4647B/2161B  rl0: 0B/0B | Mail: 43 | RAM: 24% | CPU: 55%, 58°C | Wed 2019-07-03 22:53:16
DEBUG: cycle time = 0.001 s
age0: 904B/1114B  rl0: 0B/0B | Mail: 43 | RAM: 26% | CPU: 18%, 54°C | Wed 2019-07-03 22:53:17
DEBUG: cycle time = 0.001 s
age0: 4730B/1685B  rl0: 0B/0B | Mail: 43 | RAM: 26% | CPU: 0%, 53°C | Wed 2019-07-03 22:53:18
DEBUG: cycle time = 0.001 s

Conclusion

The cycle time has dropped like a rock compared to the previous version, and CPU usage is now negligible. This rewrite was a total success! The amount of performance improvement was more than I expected, to be honest. As of 2019-07-03, I’m using this program instead of conky.

According to cloc, this program has 134 lines of code, 63 lines of comments and 37 blank lines. That’s not bad.

Writing this program also taught me the basics about using ctypes, which is a good tool to have in the toolbox.

After the above code was written, I made the mailbox location configurable as a command-line parameter. I also changed the internal API to make it easier to account for variable outputs on my machines; my laptop has a battery. The script has been published in my github scripts repo.


←  Attempting a conky replacement in Python (part 1) A simple feed reader for Youtube  →