At work, we recently bought an EBI 40 TC-01 6-channel temperature logger.
It saves data in a file with the
It comes with a ms-windows program to show the data and export to CSV and ms-excel.
However, I want to be able to use the data on my FreeBSD workstation.
So I have to figure out the data format of the
A first look with the
file utility showed me that it is an XML document:
> file Custom00.ed3 Custom00.ed3: XML 1.0 document, ASCII text, with very long lines
The good thing is that XML files are human readable (in principle).
Hint: I like to use
xmllint (with the
--format option, from the
libxml2 package) to make XML files actually readable.
Unfortunately, there is no
DOCTYPE, so no DTD.
The beginning of the document is not particularly interesting, but the information about the channels has some good clues.
<Channel> <Name> </Name> <Index>1</Index> <ChannelType>1</ChannelType> <DataCount>40000</DataCount> <HasStatus>2</HasStatus> <Type>11</Type> <CodingType>0</CodingType> <TimeFormat>1</TimeFormat> <Unit>1</Unit> <NoBits>16</NoBits> <CommaShift>1</CommaShift> <Interval>16385</Interval> <DateStart unix="1613747060" longunix="0">19.2.2021 15:4:20</DateStart> <MinLimit>0.00</MinLimit> <MaxLimit>0.00</MaxLimit> </Channel>
NoBits element tells me that each sample is stored as a 16-bit number.
My assumption is that this will be in little-endian byte order, since it is
meant to be read on an x86 machine.
CommaShift is interesting. What I assume it means is that e.g. 37.5 is
actually stored as 375.
The documentation of this device claims a resolution of 0.1 °C, so that
matches this interpretation.
The data file that I have uses all six channels. So I do not know yet what happens if a channel is not connected. Whenever I get a data file generated with less than six sensors, I will investigate further.
Other properties like
are not clear at this moment.
It seems the UNIX time stamp in the start date is in UTC (or simply without timezone):
Python 3.9.2 (default, Mar 3 2021, 17:31:28) Type "help", "copyright", "credits" or "license" for more information. >>> from datetime import datetime >>> tm = datetime.utcfromtimestamp(1613747060) >>> print(tm) 2021-02-19 15:04:20
The actual measurements are grouped into sections, like this:
<CodedData index="1" count="126"> bQFvAXsBSwFjAWoBaQFvAXkBTQFkAWcBawFxAXkBTgFhAWcBagFuAXoBTAFiAWoB ZgFwAXsBTwFiAWoBaQFxAXoBTwFjAWsBaAFvAXoBUAFhAWcBZgFvAXsBSwFkAWgB bAFwAXwBTgFjAWgBagFvAXoBTQFiAWkBawFyAX0BUQFhAWgBawFvAXwBSwFiAWkB aAFvAXwBTQFjAWsBagFxAX0BTgFkAWkBagFuAX0BSgFkAWcBbAFwAXkBUAFjAWkB aQFwAX4BTwFgAWkBagFuAXsBTgFjAWgBaAFvAX0BTgFkAWkBawFyAXkBTwFiAWkB agFyAXsBTQFiAWkB</CodedData>
Apparently, this block contains 126 data points.
Luckily, I had a CSV export file with the same data to help me. There is what the beginning of the same data looks like:
;°C ;°C ;°C ;°C ;°C ;°C 19-2-2021 16:04:20;36.5;36.7;37.9;33.1;35.5;36.2 19-2-2021 16:04:21;36.1;36.7;37.7;33.3;35.6;35.9 19-2-2021 16:04:22;36.3;36.9;37.7;33.4;35.3;35.9
From this we can conclude that a
Unit of 1 means degrees Celsius.
Interval of 16385 apparently means 1 second intervals.
The interval value is 2¹⁴+1, which is probably not a coincidence.
The manual states that the interval can vary from 0.1 seconds to 24 hours;
Interval value should be able to express that.
But at the moment the logic of that numbering escapes me.
As time permits, I will try and generate data files with different intervals
and see what changes.
There seemed to be strange patterns in the coded data. At first, this lead me down the wrong path; I saw patterns that weren’t there because I was only looking at the first couple of data blocks.
So before jumping to conclusions, I wanted to get a look at all the symbols used in the complete dataset. First, the file is read and all newlines are removed from the contents. This makes scanning the data with a regular expression easier.
Then we extract all the coded data with a regular expression and concatenate
those data blocks into a single string.
collections.Counter is built from the data to see which symbols
occur in the coded data:
Python 3.9.2 (default, Mar 3 2021, 17:31:28) >>> with open("Custom00.ed3") as df: ... lines = [ln.strip() for ln in df] ... >>> contents = ''.join(lines) >>> import re >>> dre = "<CodedData[^>]+?>(.*?)</CodedData>" >>> data = re.findall(dre, contents) >>> alldata = ''.join(data) >>> from collections import Counter >>> c = Counter(alldata) >>> print(c.keys()) dict_keys(['b', 'Q', 'F', 'v', 'A', 'X', 's', 'B', 'S', 'w', 'j', 'W', 'o', 'a', 'k', 'T', 'c', 'x', 'g', 'h', 'u', 'i', 'Z', 'U', 'y', '0', '4', 'l', '8', 't', 'z', 'Y', 'E', 'm', 'n', 'I', 'V', 'p', 'q', 'r', '1', 'M', '2', '3', '5', '6', '7', '9', '+', 'G', 'd', '/', 'C', 'D', 'H', 'J', 'K', 'L', 'e', 'N', 'P', 'O', 'R', 'f'])
The keys of the
Counter object represents all symbols found in the encoded
Looking at it, I saw
/ next to the standard alphanumerical characters.
That suggested to me that this data is actually encoded in
Knowing that, and the fact that the data is 16-bit and the decimal point shifted one position to the right, we can try to decode it:
>>> import base64 >>> import struct >>> binary = base64.b64decode(data) >>> len(binary)/2 126.0 >>> values = [j/10 for j in struct.iter_unpack("<H", binary)] >>> print(values[:12]) [36.5, 36.7, 37.9, 33.1, 35.5, 36.2, 36.1, 36.7, 37.7, 33.3, 35.6, 35.9]
Each block indeed contains 126 samples. Comparing this result to the contents at the beginning of the CSV file, we see that this decoding is correct. The samples are simply stored sequentially, in groups of 6; one for each channel.
For comments, please send me an e-mail.
- On the nature of GUI programs
- Have Python log to syslog
- Including binary data in Python scripts
- A simple feed reader for Youtube
- Attempting a conky replacement in Python (part 2)