Writing C in 2024
To learn how it works and because I don’t use Java, I decided to port professor Michael Pound’s implementation of the Enigma machine and the fitness and analysis code.
The initial port was done to Python. Unfortunately, this is one instance where Python was rather slow because of the large amount of settings that need to be tried.
So I decided to port it to C instead.
Initial port to Python
The initial port was done to Python, using the same class-based approach as in
Java.
Unfortunately, this is one instance where Python was rather slow.
Mostly when looking for the optimal rotor combinations out of sixty possible
combinations.
And especially when going through the 17576 possible starting positions for
each rotor combination.
Of course I used multiprocessing
to spread out the work over all available
cores.
But apart from that I did not see obvious ways to speed it up by e.g. using numpy
.
Why C and not C++
Some might ask the question why not C++. When C++ first came out, I started using it but over time the increasing complexity made that I grew to dislike it. So I decided that C++ was not for me in my private projects. Other people will see that differently and that is fine.
Getting re-acquainted with C
Since Python has been my go-to language for years, I have not written much C in the last decade or so. But of course I still have my templates for C code and Makefiles.
To get up to speed again I did some reading;
Those were a good heads-up.
Especially the modern way of initializing structures and arrays I found very
useful.
And I like stdint.h
and stdbool.h
.
I’ve also enjoyed Chris Wellons’ blog articles about C coding.
The nature of this program means that it only required statically allocated variables and arrays. Through this, a whole lot of possible error conditions around memory allocation is avoided.
Where Java and Python use classes, one can use structures and functions in C. If you don’t require inheritance that is relatively simple. In the case of the Enigma machine, composition is used instead of inheritance. For example, this is the definition of the Enigma machine in C:
typedef struct {
Rotor rotors[3];
Reflector reflector;
int32_t plugboard[26];
} Enigma;
extern void enigma_rotate(Enigma *m);
extern void enigma_reset(Enigma *m);
extern void enigma_encrypt(Enigma *m, const char in[], char out[],
int32_t outlen);
It would be weird to say that an Enigma is a rotor, reflector and plugboard. It has multiple rotors and a reflector and a plugboard.
Data structures
While Python has several data structures like list and dictionary (hash maps) built-in, C only has arrays.
The analysis code uses bigrams and quadgrams to check whether a decryption is a good fit for English text. These would be naturally expressed as dictionaries, but C doesn’t have those.
Bigrams
The data for the English bigrams, trigrams and quadgrams comes from https://github.com/mikepound/enigma.git.
The beginning of the bigrams file look like this:
TH,-1.449013409 HE,-1.512191427 IN,-1.61390336 ER,-1.688613896 AN,-1.702206436 RE,-1.73181457 ON,-1.754969661 AT,-1.827767223 EN,-1.837361388 ND,-1.868932911 TI,-1.872060209 ES,-1.873091732 OR,-1.893965891 TE,-1.919059941 OF,-1.92997127 ED,-1.932511289 IS,-1.947525422 IT,-1.949514395
So a bigram is a string of two letters, e.g. “AC”, and the value depends on the frequency, with higher (less negative) values signifying more occurrence.
For fast lookup in C, an array is the best candidate. If the bigram values are stored in sorted order, calculating the index into the array is simple. Sorted order means that the values are stored in the sequence AA..AZ, BA..BZ, etc. The letter A has index 0 while Z has index 25. The index value for the bigram is simply 26¹×(index first letter) + 26⁰×(index second letter), where 26⁰ is of course 1.
This index value calculation can be considered a collision-free hash function.
Calculating any offset will take approximately the same amount of time, and accessing the actual value is just a single memory reference. So access time is O(1).
In important lesson here is that in C, a sorted array can be used as a simple hash map!
There are 26² = 676 possible bigrams. The bigrams file only contains 668 bigrams. That means there are 676-668 = 8 invalid bigrams. These will be given the value -9.522878745280337 as in the original code.
The Python script mkbigrams.py
reads the bigrams
file, and emits
C code for a sorted array.
This is of course a prime example where Python shines.
Generating the table only has to be done once, so speed is not all that
relevant.
In any case, generating the bigram array only took around 0.02 s on my PC.
LETTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
L2 = len(LETTERS)**2
print("// This code has been generated by mkbigrams.py.")
print("// Bigram values are stored in a sorted array.")
print("// The index is 26×first character offset + second character offset.")
print("// Invalid bigrams are represented by -9.522878745280337")
print(f"const float BIGRAMS[{L2}] = {{")
with open("bigrams") as f:
bigramdict = dict(ln.strip().split(",") for ln in f)
index = 0
for j in LETTERS:
for k in LETTERS:
key = j + k
value = bigramdict.get(key, "-9.522878745280337")
if key == "ZZ":
print(f"{value} // [{index}] = {key}")
else:
print(f"{value}, // [{index}] = {key}")
index += 1
print("};")
The C code for creating the bigram score for a text is very simple because of the sorted array of data.
extern const float BIGRAMS[676];
float bigram_score(const char txt[]) {
int32_t n = strnlen(txt, 1024);
float total = 0.0;
for (int32_t j = 0; j < (n - 1); j++) {
total += BIGRAMS[26 * (txt[j] - 65) + (txt[j + 1] - 65)];
}
return total;
}
Quadgrams
A quadgram has four letters, e.g. “ATIO”. The quadgrams file has 42170 quadgrams. The first valid quadgram is AAAA, and the last is ZZYI. The amount of possible quadgrams is 26⁴ = 456976. So only 42170/456976 ≈ 9% of possible quadgrams are valid. Invalid quadgrams are given the value -9.522878745280337, just like with bigrams.
Still, the total array of float
values would only use 26⁴×4/1024
≈ 1785 kiB of memory.
This is a small amount for a PC in 2024!
It is a small price to pay for O(1) lookup.
Here is the code for generating the QUADGRAMS
array C code.
LETTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
L4 = len(LETTERS)**4
print("// This code has been generated by mkquadgrams.py")
print("// Quadgram values are stored in a sorted array.")
print("// The index is 26⁴×first character offset + 26³×second character offset +")
print("// 26²×third character offset + fourth character offset.")
print("// Invalid quadgrams are represented by -9.522878745280337")
print(f"const float QUADGRAMS[{L4}] = {{")
with open("quadgrams") as f:
quadgramdict = dict(ln.strip().split(",") for ln in f)
for j in LETTERS:
for k in LETTERS:
for m in LETTERS:
for p in LETTERS:
key = j + k + m + p
value = quadgramdict.get(key, "-9.522878745280337")
if key == "ZZZZ":
print(f"{value} // {key}")
else:
print(f"{value}, // {key}")
print("};")
This code runs in approximately 0.4 s on my PC.
And the C code for the lookup is as follows.
extern const float QUADGRAMS[456976];
float quadgram_score(const char txt[]) {
int32_t n = strnlen(txt, 1024);
float total = 0.0;
for (int32_t j = 0; j < (n - 3); j++) {
int32_t idx = 17576 * (txt[j] - 65) + 676 * (txt[j + 1] - 65) +
26 * (txt[j + 2] - 65) + (txt[j + 3] - 65);
total += QUADGRAMS[idx];
}
return total;
}
Time/memory trade off for quadgrams
If the data is too large, we could use a binary search in the sorted array. This is the fastest way to search a sorted array, at O(log₂n). For an array of 42170 quadgrams, O(log₂n) ≈ O(15.4).
Python code for this operation is shown below.
with open("quadgrams.sorted") as f:
lines = f.readlines()
items = list(ln.strip().split(",") for ln in lines
quadkeys = tuple(j[0] for j in items)
quadvalues = tuple(j[1] for j in items)
def find_key(quad):
left = 0
right = len(quadkeys)-1
mid = (left+right)//2
count = 1
while True:
print(f"{count} left={left}, right={right}")
v = quadkeys[mid]
if quad < v:
if quad > quadkeys[mid-1]:
return -1
right = mid
elif quad > v:
if quad < quadkeys[mid+1]:
return -1
left = mid
elif quad == v:
return mid
mid = (left+right)//2
count += 1
Let’s try it:
In [23]: find_key('OUGH') 1 left=0, right=42170 2 left=21085, right=42170 3 left=21085, right=31627 4 left=26356, right=31627 5 left=26356, right=28991 6 left=27673, right=28991 7 left=27673, right=28332 8 left=27673, right=28002 9 left=27837, right=28002 10 left=27837, right=27919 11 left=27878, right=27919 12 left=27898, right=27919 13 left=27898, right=27908 14 left=27903, right=27908 Out[23]: 27905 In [24]: quadkeys[27905] Out[24]: 'OUGH' In [25]: find_key('QQQQ') 1 left=0, right=42170 2 left=21085, right=42170 3 left=21085, right=31627 4 left=26356, right=31627 5 left=28991, right=31627 6 left=28991, right=30309 7 left=29650, right=30309 8 left=29650, right=29979 9 left=29650, right=29814 10 left=29732, right=29814 11 left=29732, right=29773 12 left=29752, right=29773 13 left=29752, right=29762 14 left=29752, right=29757 15 left=29754, right=29757 Out[25]: -1 In [26]: quadkeys[29754], quadkeys[29755], quadkeys[29756], quadkeys[29757] Out[26]: ('QING', 'QQAR', 'QSAR', 'QTIP') In [27]: find_key('AAAA') 1 left=0, right=42170 2 left=0, right=21085 3 left=0, right=10542 4 left=0, right=5271 5 left=0, right=2635 6 left=0, right=1317 7 left=0, right=658 8 left=0, right=329 9 left=0, right=164 10 left=0, right=82 11 left=0, right=41 12 left=0, right=20 13 left=0, right=10 14 left=0, right=5 15 left=0, right=2 16 left=0, right=1 Out[27]: 0
It generally takes 14—16 iterations to find the answer. This agrees with the big-O notation and testing.
If we want to do this in C, we’re not going to compare strings, but integer indexes. These can be calculated as follows:
In [31]: sum((ord(c)-65)*26**n for n, c in enumerate("AAAA"[::-1], start=0))
Out[31]: 0
In [32]: sum((ord(c)-65)*26**n for n, c in enumerate("TION"[::-1], start=0))
Out[32]: 339729
In [33]: sum((ord(c)-65)*26**n for n, c in enumerate("ATIO"[::-1], start=0))
Out[33]: 13066
In [34]: sum((ord(c)-65)*26**n for n, c in enumerate("ZZZZ"[::-1], start=0))
Out[34]: 456975
In [35]: sum((ord(c)-65)*26**n for n, c in enumerate("OUGH"[::-1], start=0))
Out[35]: 259747
In [36]: sum((ord(c)-65)*26**n for n, c in enumerate("QING"[::-1], start=0))
Out[36]: 286968
This approach requires 42170*4/1024 ≈ 165 kiB for the sorted indexes, and
another 165 kiB for the sorted float
values.
Here is the Python code that generates the tables.
LETTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
with open("quadgrams") as f:
items = [ln.strip().split(",") for ln in f]
# Very important: sort by quadgrams!
items.sort(key=lambda x: x[0])
keys = [j[0] for j in items]
values = [j[1] for j in items]
L = len(keys)
print("// This code has been generated by mkquadgrams2.py")
print("// It contains two arrays, both sorted by the quadgrams")
print("// Quadgram keys are stored in QUADKEYS.")
print("// Quadgram values are stored in QUADVALUES.")
print("// The index is 26⁴×first character offset + 26³×second character offset +")
print("// 26²×third character offset + fourth character offset.")
print("#include <stdint.h>")
print()
print(f"const int32_t QUADKEYS[{L}] = " + "{")
for key in keys[:-1]:
keyval = sum((ord(c)-65)*26**n for n, c in enumerate(key[::-1], start=0))
print(f" {keyval:6d}, // {key}")
# Last key has no “,” after the keyval!
keyval = sum((ord(c)-65)*26**n for n, c in enumerate(keys[-1][::-1], start=0))
print(f" {keyval:6d} // {keys[-1]}")
print("};")
print()
print(f"const float QUADVALUES[{L}] = " + "{")
for value in values[:-1]:
print(f" {value},")
print(f" {values[-1]}")
print("};")
The time needed to generate these tables is around 0.2 s on my machine.
The lookup algorithm translated into C is shown below.
// Binary lookup in a *sorted* array of int32_t.
// Returns the index of “value”, or -1 if “value” is not in the array.
int32_t array_lookup(int32_t value, const int32_t array[], int32_t arraylen) {
int32_t left = 0, right = arraylen - 1;
int32_t mid = (left + right) / 2, midval = 0;
while (true) {
midval = array[mid];
if (value == midval) {
return mid;
}
if (value < midval) {
if (value > array[mid - 1]) {
// Value is not present in the array!
return -1;
}
right = mid;
} else if (value > midval) {
if (value < array[mid + 1]) {
// Value is not present in the array!
return -1;
}
left = mid;
}
mid = (left + right) / 2;
}
// Should not be reached.
return -1;
}
The fitness function based on this is shown below.
extern const int32_t QUADKEYS[42171];
extern const float QUADVALUES[42171];
float quadgram_score2(const char txt[]) {
int32_t n = strnlen(txt, 1024);
float total = 0.0;
for (int32_t j = 0; j < (n - 3); j++) {
int32_t val = 17576 * (txt[j] - 65) + 676 * (txt[j + 1] - 65) +
26 * (txt[j + 2] - 65) + (txt[j + 3] - 65);
// 26³ = 17576, 26² = 676
int32_t idx = array_lookup(val, QUADKEYS, 42171);
if (idx != -1) {
total += QUADVALUES[idx];
}
}
return total;
}
This solution uses approximately 18.5% of the amount of memory for a single table. As a consequence, the look-up time is theoretically O(15) instead of O(1). In practice, it was around 11× slower than a direct lookup. So in this case the first solution was used. But this is a valuable technique for datasets that would otherwise not fit in memory.
Using all available cores
It was a pleasant surprise how relatively easy it was to run code on
multiple cores using OpenMP.
Of course I had to read up on how it works.
In the end, all it required in this case was two #pragma
statements and
a compiler flag.
From my perspective, parallelizing a loop in C using OpenMP is easier than
using multiprocessing
in Python.
It was used in testing all the rotor combinations and start positions. Each of these calculations in independent of the others, so it is a good candidate for parallelization. This reduced the runtime by 3/4, which is the best possible reduction on my machine with four cores.
The compiler in the FreeBSD base system (clang 18) supports it out of the box.
The gcc13
compiler from ports also works.
However, when compiled with identical flags, the runtime of the gcc OpenMP code is
double that of the clang code, even though both where using four threads
according to ps
.
Since I use the compiler from the FreeBSD base system by default, I’ve not
investigated this further.
Lessons learned
- The vim editor has very good support for C coding.
- Where possible, use statically or automatically allocated arrays and structures.
- Sorted C arrays can be used as simple dictionaries.
- Using a
struct
and functions is fine for where object orientation is a good fit. Composition can be used instead of inheritance. - OpenMP is easy to use from C and works well in suitable cases.
- As opposed to the old
indent
,clang-format
is a decent solution for getting a consistent formatting of modern C code. I use “google” style but disabled sorting of include files. - Python is very well suited to generating static data in the form of C structures and arrays.
All in all, it was a much more pleasurable experience than I anticipated.
For comments, please send me an e-mail.