Roland's homepage

My random knot in the Web

Decoding temperature data logger files

At work, we recently bought an EBI 40 TC-01 6-channel temperature logger. It saves data in a file with the ed3 extension. It comes with a ms-windows program to show the data and export to CSV and ms-excel.

However, I want to be able to use the data on my FreeBSD workstation. So I have to figure out the data format of the ed3 files.

A first look with the file utility showed me that it is an XML document:

> file Custom00.ed3
Custom00.ed3: XML 1.0 document, ASCII text, with very long lines

The good thing is that XML files are human readable (in principle). Hint: I like to use xmllint (with the --format option, from the libxml2 package) to make XML files actually readable.

Unfortunately, there is no DOCTYPE, so no DTD. That means that there is no way to find out what the possible values of some of the more opaque tags like <Type> and <Units> are.

The beginning of the document is not particularly interesting, but the information about the channels has some good clues.

    <Name> </Name>
    <DateStart unix="1613747060" longunix="0">19.2.2021 15:4:20</DateStart>

The NoBits tag tells me that each sample is stored as a 16-bit number. My assumption is that this will be in little-endian byte order, since it is meant to be read on an x86 machine.

The CommaShift is interesting. What I assume it means is that e.g. 37.5 is actually stored as 375. The documentation of this device claims a resolution of 0.1 °C, so that matches this interpretation.

The data file that I have uses all six channels. So I do not know yet what happens if a channel is not connected. Whenever I get a data file generated with less than six sensors, I will investigate further.

Other tags like HasStatus, Type, CodingType, ChannelType are not clear at this moment. It might be possible to get more information about these to look in the user interface of the device. At least we should be able to get a list of possible choices. But for the moment it has not been necessary for me to investigate that further.

It seems the UNIX time stamp in the start date is in UTC (or simply without timezone):

Python 3.9.2 (default, Mar  3 2021, 17:31:28)
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> tm = datetime.utcfromtimestamp(1613747060)
>>> print(tm)
2021-02-19 15:04:20

The actual measurements are grouped into sections, like this:

<CodedData index="1" count="126">

Apparently, this block contains 126 data points.

Luckily, I had a CSV export file with the same data to help me. There is what the beginning of the same data looks like:

;°C ;°C ;°C ;°C ;°C ;°C
19-2-2021 16:04:20;36.5;36.7;37.9;33.1;35.5;36.2
19-2-2021 16:04:21;36.1;36.7;37.7;33.3;35.6;35.9
19-2-2021 16:04:22;36.3;36.9;37.7;33.4;35.3;35.9

From this we can conclude that a Unit of 1 means degrees Celsius.

Apparently, an Interval of 16385 apparently means 1 second intervals. The interval value is 2¹⁴+1, which is probably not a coincidence. The manual states that the interval can vary from 0.1 seconds to 24 hours; the Interval value should be able to express that. But at the moment the logic of that numbering escapes me. As time permits, I will try and generate data files with different intervals and see what changes.

There seemed to be strange patterns in the coded data. At first, this lead me down the wrong path; I saw patterns that weren’t there because I was only looking at the first couple of data blocks.

So before jumping to conclusions, I wanted to get a look at all the symbols used in the complete dataset. First, the file is read and all newlines are removed from the contents. This makes scanning the data with a regular expression easier.

Then we extract all the coded data with a regular expression and concatenate those data blocks into a single string. Then a collections.Counter is built from the data to see which symbols occur in the coded data:

Python 3.9.2 (default, Mar  3 2021, 17:31:28)
>>> with open("Custom00.ed3") as df:
...     lines = [ln.strip() for ln in df]
>>> contents = ''.join(lines)
>>> import re
>>> dre = "<CodedData[^>]+?>(.*?)</CodedData>"
>>> data = re.findall(dre, contents)
>>> alldata = ''.join(data)
>>> from collections import Counter
>>> c = Counter(alldata)
>>> print(c.keys())
dict_keys(['b', 'Q', 'F', 'v', 'A', 'X', 's', 'B', 'S', 'w', 'j', 'W', 'o', 'a', 'k',
'T', 'c', 'x', 'g', 'h', 'u', 'i', 'Z', 'U', 'y', '0', '4', 'l', '8', 't', 'z', 'Y',
'E', 'm', 'n', 'I', 'V', 'p', 'q', 'r', '1', 'M', '2', '3', '5', '6', '7', '9', '+',
'G', 'd', '/', 'C', 'D', 'H', 'J', 'K', 'L', 'e', 'N', 'P', 'O', 'R', 'f'])

The keys of the Counter object represents all symbols found in the encoded data. Looking at it, I saw + and / next to the standard alphanumerical characters. That suggested to me that this data is actually encoded in base64 format.

Knowing that, and the fact that the data is 16-bit and the decimal point shifted one position to the right, we can try to decode it:

>>> import base64
>>> import struct
>>> binary = base64.b64decode(data[0])
>>> len(binary)/2
>>> values = [j[0]/10 for j in struct.iter_unpack("<H", binary)]
>>> print(values[:12])
[36.5, 36.7, 37.9, 33.1, 35.5, 36.2, 36.1, 36.7, 37.7, 33.3, 35.6, 35.9]

Each block indeed contains 126 samples. Comparing this result to the contents at the beginning of the CSV file, we see that this decoding is correct. The samples are simply stored sequentially, in groups of 6; one for each channel.

For comments, please send me an e-mail.

Related articles

←  Building an epub from a single ReStructuredText file Gnumeric build fix for FreeBSD  →