After some initial experiments with sqlite as a replacement for a json data file format (which I did not pursue in the end), I have now started using sqlite in earnest. And I must say that I quite like it.
At $work, we still have to submit written time sheets. The contents of those are then typed into an AS/400 application. (it doesn’t get more legacy than that, unless you’ve still got a punch-card tabulator somewhere) For me however, this method is not very useful. What I want is to answer the question “what do I spend my time on?”
A significant portion of my work is not related to production orders, so it doesn’t get entered into the system. Furthermore, this system does not support a lot of different queries for this information. And since it is a legacy application, there is little drive to make it more useful.
Since my handwriting is pretty bad, I started using an ms-excel spreadsheet formatted like the original time sheet years ago. Not only do the people who have to do the data-entry appreciate not having to decipher my handwriting, I can also analyse the data to check my time usage.
Luckily ms-excel has been using XML-files in a ZIP container for several years now. I’ve unzipped several of these files and used xmllint -format to turn the XML-files inside them in a more or less readable form. There is still some binary gobbledygook inside (printer settings, for one thing) but it’s pretty easy to retrieve the contents of cells. Since I just need to extract data from some columns on each sheet, I decided to use regular expressions in Python to fish out the data that I need. Partially because I couldn’t be bothered to install and learn an XML parser, and partially to show the naysayers that you can retrieve data without parsing the whole thing. :-)
Initially I duplicated that code into every program that wanted to query those ms-excel files. There must be a better way! Enter sqlite. Basically I had several pieces of data for each entry:
- order number
- sub order number
- hours worked
The date items are integers, the hours is a float and the remaining items are text. Technically the order number should be an integer. But if I do work that is not order related I put “XXXX” in the order number. So that’s why I use a text field for it.
So I wrote a Python program that read an ms-excel file containing the time-sheets for a whole year. It collects all the entries and stashes them in an sqlite database with a single table. While I have used PostgreSQL before, it is overkill for a situation like this where you just want to shove some data in a table.
Then I modified the Python query programs to use that database. These programs have become a lot simpler through this change. Because now SQL SELECT statements can be used to return part of the database’s contents. For example, to find the sum of all hours spent on an order, I use the following query.
SELECT sum(hours) FROM timesheet WHERE ordernum = ?
(The ? is a parameter that is filled-in when the query is executed.) Since I always fill in the remarks column with a description of the work done, I can use that to sum the hours spent on non-order tasks where the remark contains a certain word:
SELECT sum(hours) FROM timesheet WHERE ordernum LIKE "%XX%" AND remarks LIKE ?
This has provided me with a lot more insight in what I work on. This in turn helps me with long-term planning.