Decoding temperature data logger files
At work, we recently bought an EBI 40 TC-01 6-channel temperature logger.
It saves data in a file with the ed3
extension.
It comes with a ms-windows program to show the data and export to CSV and ms-excel.
However, I want to be able to use the data on my FreeBSD workstation.
So I have to figure out the data format of the ed3
files.
A first look with the file
utility showed me that it is an XML document:
> file Custom00.ed3 Custom00.ed3: XML 1.0 document, ASCII text, with very long lines
The good thing is that XML files are human readable (in principle).
Hint: I like to use xmllint
(with the --format
option, from the
libxml2
package) to make XML files actually readable.
Unfortunately, there is no DOCTYPE
, so no DTD.
That means that there is no way to find out what the possible values of some
of the more opaque tags like <Type>
and <Units>
are.
The beginning of the document is not particularly interesting, but the information about the channels has some good clues.
<Channel>
<Name> </Name>
<Index>1</Index>
<ChannelType>1</ChannelType>
<DataCount>40000</DataCount>
<HasStatus>2</HasStatus>
<Type>11</Type>
<CodingType>0</CodingType>
<TimeFormat>1</TimeFormat>
<Unit>1</Unit>
<NoBits>16</NoBits>
<CommaShift>1</CommaShift>
<Interval>16385</Interval>
<DateStart unix="1613747060" longunix="0">19.2.2021 15:4:20</DateStart>
<MinLimit>0.00</MinLimit>
<MaxLimit>0.00</MaxLimit>
</Channel>
The NoBits
tag tells me that each sample is stored as a 16-bit number.
My assumption is that this will be in little-endian byte order, since it is
meant to be read on an x86 machine.
The CommaShift
is interesting. What I assume it means is that e.g. 37.5 is
actually stored as 375.
The documentation of this device claims a resolution of 0.1 °C, so that
matches this interpretation.
The data file that I have uses all six channels. So I do not know yet what happens if a channel is not connected. Whenever I get a data file generated with less than six sensors, I will investigate further.
Other tags like HasStatus
, Type
, CodingType
, ChannelType
are not clear at this moment.
It might be possible to get more information about these to look in the user
interface of the device.
At least we should be able to get a list of possible choices.
But for the moment it has not been necessary for me to investigate that further.
It seems the UNIX time stamp in the start date is in UTC (or simply without timezone):
Python 3.9.2 (default, Mar 3 2021, 17:31:28)
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> tm = datetime.utcfromtimestamp(1613747060)
>>> print(tm)
2021-02-19 15:04:20
The actual measurements are grouped into sections, like this:
<CodedData index="1" count="126">
bQFvAXsBSwFjAWoBaQFvAXkBTQFkAWcBawFxAXkBTgFhAWcBagFuAXoBTAFiAWoB
ZgFwAXsBTwFiAWoBaQFxAXoBTwFjAWsBaAFvAXoBUAFhAWcBZgFvAXsBSwFkAWgB
bAFwAXwBTgFjAWgBagFvAXoBTQFiAWkBawFyAX0BUQFhAWgBawFvAXwBSwFiAWkB
aAFvAXwBTQFjAWsBagFxAX0BTgFkAWkBagFuAX0BSgFkAWcBbAFwAXkBUAFjAWkB
aQFwAX4BTwFgAWkBagFuAXsBTgFjAWgBaAFvAX0BTgFkAWkBawFyAXkBTwFiAWkB
agFyAXsBTQFiAWkB</CodedData>
Apparently, this block contains 126 data points.
Luckily, I had a CSV export file with the same data to help me. There is what the beginning of the same data looks like:
;°C ;°C ;°C ;°C ;°C ;°C 19-2-2021 16:04:20;36.5;36.7;37.9;33.1;35.5;36.2 19-2-2021 16:04:21;36.1;36.7;37.7;33.3;35.6;35.9 19-2-2021 16:04:22;36.3;36.9;37.7;33.4;35.3;35.9
From this we can conclude that a Unit
of 1 means degrees Celsius.
Apparently, an Interval
of 16385 apparently means 1 second intervals.
The interval value is 2¹⁴+1, which is probably not a coincidence.
The manual states that the interval can vary from 0.1 seconds to 24 hours;
the Interval
value should be able to express that.
But at the moment the logic of that numbering escapes me.
As time permits, I will try and generate data files with different intervals
and see what changes.
There seemed to be strange patterns in the coded data. At first, this lead me down the wrong path; I saw patterns that weren’t there because I was only looking at the first couple of data blocks.
So before jumping to conclusions, I wanted to get a look at all the symbols used in the complete dataset. First, the file is read and all newlines are removed from the contents. This makes scanning the data with a regular expression easier.
Then we extract all the coded data with a regular expression and concatenate
those data blocks into a single string.
Then a collections.Counter
is built from the data to see which symbols
occur in the coded data:
Python 3.9.2 (default, Mar 3 2021, 17:31:28)
>>> with open("Custom00.ed3") as df:
... lines = [ln.strip() for ln in df]
...
>>> contents = ''.join(lines)
>>> import re
>>> dre = "<CodedData[^>]+?>(.*?)</CodedData>"
>>> data = re.findall(dre, contents)
>>> alldata = ''.join(data)
>>> from collections import Counter
>>> c = Counter(alldata)
>>> print(c.keys())
dict_keys(['b', 'Q', 'F', 'v', 'A', 'X', 's', 'B', 'S', 'w', 'j', 'W', 'o', 'a', 'k',
'T', 'c', 'x', 'g', 'h', 'u', 'i', 'Z', 'U', 'y', '0', '4', 'l', '8', 't', 'z', 'Y',
'E', 'm', 'n', 'I', 'V', 'p', 'q', 'r', '1', 'M', '2', '3', '5', '6', '7', '9', '+',
'G', 'd', '/', 'C', 'D', 'H', 'J', 'K', 'L', 'e', 'N', 'P', 'O', 'R', 'f'])
The keys of the Counter
object represents all symbols found in the encoded
data.
Looking at it, I saw +
and /
next to the standard alphanumerical characters.
That suggested to me that this data is actually encoded in base64
format.
Knowing that, and the fact that the data is 16-bit and the decimal point shifted one position to the right, we can try to decode it:
>>> import base64
>>> import struct
>>> binary = base64.b64decode(data[0])
>>> len(binary)/2
126.0
>>> values = [j[0]/10 for j in struct.iter_unpack("<H", binary)]
>>> print(values[:12])
[36.5, 36.7, 37.9, 33.1, 35.5, 36.2, 36.1, 36.7, 37.7, 33.3, 35.6, 35.9]
Each block indeed contains 126 samples. Comparing this result to the contents at the beginning of the CSV file, we see that this decoding is correct. The samples are simply stored sequentially, in groups of 6; one for each channel.
For comments, please send me an e-mail.
Related articles
- Profiling Python scripts(6): auto-orient
- Profiling with pyinstrument
- From python script to executable with cython
- On Python speed
- Python 3.11 speed comparison with 3.9