# A1: Enigma
**Soft deadline:** Thursday, 09/10/15, 11:59 pm
**Hard deadline:** Saturday, 09/12/15, 11:59 pm
*This assignment is to be done as individuals, not with partners nor with teams.
Please review the [course policies on academic integrity][syllabus].*
In this assignment, you will develop a software replica of
the Enigma encryption machine used by the German military in
World War II. Efforts to break the Enigma cipher led to the
development of the first electronic computers at Bletchley
Park, England: the Bombe (designed by Alan Turing) and its
successor, Colossus (designed by Tommy Flowers). Using
computers, the Allies were eventually able to break the
Enigma code, giving them an intelligence edge that changed
the balance of the war.
For a tutorial on how the Enigma machine works, read the
[Enigma Cipher Machine][sale] website constructed by Tony
Sale, original curator of the Bletchley Park Museum. There
is a [worked example][sale-ex] on that site that you should read;
also be sure to read the Technical Note
at the end of that example. And to
experiment with a virtual Enigma machine, see the [Enigma
Machine Simulator][sim].
## Introduction to the Enigma
The Enigma machine implemented a *substitution cipher*, which
encrypts a message by substituting one character for
another. Such ciphers go back at least as far as Julius
Caesar, who used a simple substitution cipher to encrypt
military orders. The Caesar cipher specified that each
letter in the alphabet would be encrypted using the letter
three positions later in the alphabet. For example, 'A'
would be encrypted as 'D', 'B' would be encrypted as 'E',
'C' would be encrypted as 'F', and so on. The code wraps
around at the end of the alphabet, so 'X', 'Y' and 'Z' would
be encrypted as 'A', 'B', and 'C', respectively. Although
the Caesar cipher was effective in its time (when very few
people could read at all), its simple pattern of encrypting
letters seems pretty obvious today.
A *keyed* substitution cipher is more effective. The *key* is
a secret presumably known only to the message sender and
receiver; it defines the mapping between a *plaintext* letter
and a *ciphertext* letter. We can write the key as a
26-character string, in which the first character is the
encryption of 'A', the second character is the encryption of
'B', and so on. The key for the Caesar cipher would thus be
"DEFGHIJKLMNOPQRSTUVWXYZABC". Of course, there are many
other possible keys: since there are 26! (roughly 4 \* 10^26)
different arrangements of the 26 letters in the alphabet,
there are 26! different keys and so 26! different keyed
substitution ciphers. Nonetheless, this simple type of
encryption is vulnerable to statistical attacks, as anyone
who has solved cryptogram puzzles can attest.
To implement a substitution cipher, the Enigma machine
utilized interchangeable *rotors* with internal circuitry that
mapped each letter of the alphabet to a different letter.
When placed side-by-side on a *spindle*, three rotors would
define an even more complex substitution pattern, with the
wires inside each rotor connecting to its neighboring
rotors. When the operator typed a key on the Enigma keyboard,
an electrical current would flow through the wires in the
rotors, forming a closed circuit and lighting up the letter
specified by the rotor substitution pattern.
What made the Enigma cipher especially difficult to break was
that the rotors moved before each letter was encoded,
essentially changing the substitution pattern each time. In
particular, the rightmost rotor shifted one position on
every encoding, and the other rotors shifted conditionally,
based on the position of *notches* on the rotors: when a
rotor shifted past a notch, not only did that rotor shift,
but also the rotor to its left. Thus, the letter 'E' might
get mapped to a 'Q' the first time it was encountered, but
to a completely different letter each successive time.
Enigma machines also used a *reflector*, which was inserted at
the leftmost side of the spindle. A keypress caused current
to flow right-to-left through the rotors, through the
reflector, then back through the rotors, left-to-right. The
reflector's substitution pattern caused the entire circuit
to be symmetric, meaning that if 'E' mapped to 'Q' in a
particular machine state, then 'Q' would likewise be mapped
to 'E' in that same state. Thus, the same rotor settings
would work to encrypt and decrypt a message. (It turns out
this was a major cryptographic weakness in the design of
Enigma.)
The original Enigma machines used at the beginning of World
War II had three rotors, known by the Roman-numeral
designations I, II, and III, which could be placed on the
spindle in any order (giving 3! = 6 possible combinations).
In addition, each rotor was installed by the operator with a
starting letter showing in a window (giving 26 possible
orientations for each rotor). And there was a single
standard reflector, designated B. Thus, the total number of
possible initial settings for an Enigma machine was 6 \* 26 \*
26 \* 26 = 105,456.
(Note: we are omitting two more Enigma mechanisms, the *ring
setting* and *plugboard*. For aficionados of the Enigma, you
may assume the ring setting is always 'A' for each rotor,
and that no wires are installed in the plugboard.)
**Exercise:** Download the
[Paper Enigma Machine][paper] and follow the directions to
construct a working model out of paper. Using this paper
machine, you should be able to encrypt and decrypt messages
using the same algorithm as an Enigma machine. To test your
understanding, set the initial configuration to I-F, II-U,
and III-N. That is, rotor I should be placed leftmost with
the letter F on its left side aligning with the rotor
setting line; rotor II should be in the middle with U at the
rotor setting line, and rotor III should be rightmost with N
at the rotor setting line. Then decrypt the ciphertext
YNOXQ; it should produce the plaintext OCAML.
[sale]: http://www.codesandciphers.org.uk/enigma/
[sale-ex]: http://www.codesandciphers.org.uk/enigma/example1.htm
[sim]: http://startpad.googlecode.com/hg/labs/js/enigma/enigma-sim.html
[paper]: http://mckoss.com/Crypto/Enigma.htm
## Objectives
* Gain familiarity with basic OCaml language features, including
built-in data types, lists, functions, and printing.
* Practice writing programs in the functional style using immutable data.
* Improve skills in computational thinking
by learning about and modeling a mildly complex, real-world computational artifact.
## Recommended reading
* [Lectures and recitations 1–4][web]
* [RWO chapters 1–3][rwo]
* The [CS 3110 Style Guide][style]
* The policies about assignments in the [course syllabus][syllabus]
[web]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/
[rwo]: https://realworldocaml.org/v1/en/html/index.html
[style]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/handouts/style.html
[syllabus]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/syllabus.php
## Requirements
Your task in this assignment is to simulate the operation of
an Enigma machine.
* Your solution must provide the functions described below,
with exactly those names and types.
* Your solution must correctly implement the Enigma cipher,
including the spindle, rotors, reflector, notches, and
starting positions of the Enigma machine. A good way to
check your cipher implementation is to compare its
input/output behavior with the [Enigma Machine
Simulator][sim] and [Paper Enigma Machine][paper].
* Your code must be written with good style and be well documented.
## Required functions
You are required to adhere to these names and types. Solutions
that do not adhere to them will be penalized. Recall that OCaml 4.02
is in a transient state of converting `string` to an immutable data type.
Currently, the types `string` and `bytes` are synonyms. So anywhere
we write `string` in the code below, it's okay if your implementation
actually produces/uses `bytes`.
```
(* [cipher refl rotors notches starts s] computes the Enigma cipher, where
* - [refl] is the wiring of the reflector,
* - [rotors] is a list of the rotors (which must contain at least
* one element), as they are installed from left to right on
* the spindle,
* - [notches] is a list of where the notch is on each rotor,
* where the nth character of [notches] is the location of
* the notch on the nth rotor in [rotors],
* - [starts] is a list of the starting character for each rotor
* as positioned on the spindle, where the nth character of
* [starts] is the starting character for the nth rotor in
* [rotors].
* - [s] is the string to be ciphered.
*)
val cipher : string -> string list -> char list -> char list
-> string -> string
(* [simulate] takes the same inputs as [cipher] but prints
* a simulation of the Enigma machine at each step of the
* computation.
*)
val simulate : string -> string list -> char list -> char list
-> string -> unit
```
The functionality of your `simulate` function is entirely up
to you. It need not be fancy, but it should
communicate how each character is routed through the
circuit, and how the rotors step. It is to your benefit to
design the output of `simulate` well, because it will be
used in awarding partial credit in case your `cipher`
implementation is incorrect.
**Message formatting:**
The design of Enigma limits it to the character set 'A'-'Z'.
Lower-case letters and punctuation are not supported. For
this assignment, spaces should be preserved, such that a
plaintext ATTACK AT DAWN might be encrypted as TXLHKV PL
FCCB. Numbers will have to be spelled out.
**Wiring specifications:**
The wiring of a rotor is specified as a 26-character string.
The standard rotor I from the 1930 Enigma I, for example,
would be "EKMFLGDQVZNTOWYHXUSPAIBRCJ". That means, in
starting position 'A', when current is flowing right to
left, the rotor would map 'A' to 'E', 'B' to 'K', 'C' to
'M', and so on. Of course, that map changes as the rotor
steps. Two other standard rotors are rotor II, which is
wired "AJDKSIRUXBLHWTMCQGZNPYFVOE", and rotor III, which is
wired "BDFHJLCPRTXVZNYEIWGAKMUSQO".
Likewise, the wiring of a reflector is specified as a
26-character string. The standard reflector B would be
"YRUHQSLDPXNGOKMIEBFZCWVJAT". That means that 'A' is mapped
to 'Y', 'B' is mapped to 'R', 'C' is mapped to 'U', and so
on. That map does not change.
You can read about the [wiring of other historical rotors
and reflectors][wiring]. Do not assume that rotors can only
be from that list, though; we might test your simulator with
shiny new rotors.
[wiring]: http://www.cryptomuseum.com/crypto/enigma/wiring.htm
**Stepping:** The intuition of stepping is that the rotors behave mostly like the
numbers on an odometer: the rightmost completes a full revolution, and
in so doing, causes the next rotor to its left to take a single step, and so forth.
But the [notch stepping system on the Enigma][step] is a little more complicated:
*Rule 1:* The rightmost rotor always takes a single step just before
enciphering each character.
*Rule 2:* Just before every character is enciphered, if any rotor
*except the leftmost* is positioned at its notch, then that rotor and the
rotor to its left steps.
*Rule 3:* No rotor steps more than once per character enciphered, even
if the above rules could be construed as suggesting that a rotor would step twice.
Here is an example: Suppose the installed rotors are III-II-I, starting
in position KDO. Then as each character is enciphered, they would step
as follows: KDO, KDP, KDQ, KER, LFS, LFT, LFU, ...
Here is another example: Suppose the installed rotors are III-II-I, starting
in position VDP. Then as each character is enciphered, they would step
as follows: VDP, VDQ, VER, WFS, WFT, ...
[CLARIFICATION 09/03/15]
Those examples assume that the notches on the historical
rotors are as follows: I-Q, II-E, III-V. You can see that
those letters are where the notches are marked on the [Paper
Enigma Machine][paper]. (Technically, those letters are
actually the *turnovers* of the rotors—the turnover is
the letter that appears as the rotor setting when the
physical notch is reached. But, for simplicity, we dispense
in this assignment with the distinction between notches and
turnovers.)
[CLARIFICATION 09/06/15]
Spaces in the message do not cause the machine to step.
Some Enigma machines were built with different stepping rules; in case of
discrepancy, the rules above are in effect for this assignment.
Note that the [Paper Enigma Machine][paper] in its step 1 under "Operation"
is somewhat vague about Rule 2, because the Paper Enigma Machine does not say what
to do if there does not exist a "rotor to the left".
[step]: http://users.telenet.be/d.rijmenants/en/enigmatech.htm#steppingmechanism
## What we provide
In the release code on the course website you will find these files:
* A template file `enigma.ml` with some function stubs.
* A template file `a1.txt` for submitting your written feedback.
## What to turn in
Submit files with these names on [CMS][cms]:
* `enigma.ml`, containing your solution code.
* `a1.txt`, containing your written feedback.
[cms]: https://cms.csuglab.cornell.edu/
## Grading issues
* **Names and types:** You are required to adhere to the names and
types of the required functions. If your code does not
compile against the provided ~~public test file~~ [CORRECTION 09/03/15] test cases
below, your solution will receive minimal credit.
* **Code style:** Refer to the [CS 3110 style guide][style].
Ugly code that is functionally correct will nonetheless be penalized.
Take extra time to think and find elegant solutions.
* **Late submissions:** Carefully review the course policies on
submission and late assignments. Verify before the deadline on CMS
that you have submitted the correct version.
* **Environment:** Your solution must function properly in the
[3110 virtual machine][vm], which is the official grading
environment for the course.
[vm]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/vm.html
## Prohibited OCaml features
You may not use imperative data structures, including
refs, arrays, and mutable fields. Strings are allowed, but
the deprecated mutable functions on them are not.
You have not seen these features in lecture or recitation,
so we doubt you'll be tempted. Your solutions may not
require linking any additional libraries.
## Testing
We encourage you to get into the habit of testing your implementation
early and often. In later assignments, we will ask you to turn in
test suites. But for this first assignment, we won't collect
your tests. **You should nonetheless test.**
Here are a few good test cases to get you started:
```
let id = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
let rotorI = "EKMFLGDQVZNTOWYHXUSPAIBRCJ"
let rotorII = "AJDKSIRUXBLHWTMCQGZNPYFVOE"
let rotorIII = "BDFHJLCPRTXVZNYEIWGAKMUSQO"
let reflB = "YRUHQSLDPXNGOKMIEBFZCWVJAT"
let () = assert ((cipher id [id] ['Q'] ['M'] "Q") = "Q")
let () = assert ((cipher reflB [rotorIII] ['Q'] ['Z'] "B") = "D")
let () = assert ((cipher id [id] ['Y'] ['Z'] "OCAML") = "OCAML")
let () = assert ((cipher reflB [rotorI; rotorII; rotorIII]
['Q'; 'E'; 'V'] ['F'; 'U'; 'N']
"OCAML") = "YNOXQ")
```
[CORRECTION 09/06/15]
The order of notches in the final test case above was incorrect with respect
to the historical rotors, though that happened to have no effect on the
input/output behavior of the test case.
The [assertion construct][assert]
`let () = assert e` evaluates expression `e`.
If it evaluates to `false`, an exception is raised; otherwise,
OCaml continues processing the rest of the program.
[assert]: http://caml.inria.fr/pub/docs/manual-ocaml/extn.html#s%3Aassert
You might want to begin collecting your test cases in a file,
so that you can conveniently run an entire set of tests.
## Hints
* Before designing or implementing any code, use the [Paper Enigma Machine][paper]
to teach yourself how the Enigma cipher works. Make sure you can decrypt
YNOXQ to OCAML as discussed in the introduction to this assignment.
* There are useful library functions in the `String` and `Char` modules.
* Think about implementing your solution in stages. Here is one possible
progression:
- First build something that works for messages that contain only a
single character, with no stepping, with a single rotor that
implements the wiring "ABCDEFGHIJKLMNOPQRSTUVWXYZ".
- Then improve your solution to something that works for
other rotor wirings.
- Then to multiple rotors.
- Then to multiple characters. This is the hardest part, because it
requires getting all the corner cases with stepping to work correctly.
Remember, stepping always occurs just before a character is enciphered.
## Karma
*Karma problems* are a way of going above and beyond the
call of 3110 duty. Solving them is not required, nor does it
earn any explicit extra credit. But the course staff will
keep track of the karma problems that you solve throughout
the semester. The number of orange camels, below, is an
indicator of how hard the karma problem might be.
Be mindful that you still need to provide the required
functions, above, so make sure your implementation of any
karma functionality does not violate the
specifications/names/types of those functions.
**Enigmatic:**
For additional fun, learn about and implement some of the following:
* Ring settings (Ringstellung)
* The plugboard (Steckerbrett)
* Multiple notches on a rotor (naval rotors VI, VII, and VIII)
* The rewirable reflector (Umkehrwalze D)
**Sims:**
Implement a fancier simulation using
an [OCaml text interface library][text].
[text]: http://caml.inria.fr/cgi-bin/hump.en.cgi?sort=0&browse=64
**Codebreaker:**
Implement a tool that breaks Enigma encryptions without having the key.
A brute force attack against the simple three-rotor machine used in most
of the assignment is probably not that hard. Doing it for a five-rotor
machine with plugboard, however, could be an interesting challenge.
* * *
**Acknowledgement:** Adapted from Prof. David Reed (Creighton University).