A1: Enigma

# A1: Enigma **Soft deadline:** Thursday, 09/10/15, 11:59 pm **Hard deadline:** Saturday, 09/12/15, 11:59 pm *This assignment is to be done as individuals, not with partners nor with teams. Please review the [course policies on academic integrity][syllabus].* In this assignment, you will develop a software replica of the Enigma encryption machine used by the German military in World War II. Efforts to break the Enigma cipher led to the development of the first electronic computers at Bletchley Park, England: the Bombe (designed by Alan Turing) and its successor, Colossus (designed by Tommy Flowers). Using computers, the Allies were eventually able to break the Enigma code, giving them an intelligence edge that changed the balance of the war. For a tutorial on how the Enigma machine works, read the [Enigma Cipher Machine][sale] website constructed by Tony Sale, original curator of the Bletchley Park Museum. There is a [worked example][sale-ex] on that site that you should read; also be sure to read the Technical Note at the end of that example. And to experiment with a virtual Enigma machine, see the [Enigma Machine Simulator][sim]. ## Introduction to the Enigma The Enigma machine implemented a *substitution cipher*, which encrypts a message by substituting one character for another. Such ciphers go back at least as far as Julius Caesar, who used a simple substitution cipher to encrypt military orders. The Caesar cipher specified that each letter in the alphabet would be encrypted using the letter three positions later in the alphabet. For example, 'A' would be encrypted as 'D', 'B' would be encrypted as 'E', 'C' would be encrypted as 'F', and so on. The code wraps around at the end of the alphabet, so 'X', 'Y' and 'Z' would be encrypted as 'A', 'B', and 'C', respectively. Although the Caesar cipher was effective in its time (when very few people could read at all), its simple pattern of encrypting letters seems pretty obvious today. A *keyed* substitution cipher is more effective. The *key* is a secret presumably known only to the message sender and receiver; it defines the mapping between a *plaintext* letter and a *ciphertext* letter. We can write the key as a 26-character string, in which the first character is the encryption of 'A', the second character is the encryption of 'B', and so on. The key for the Caesar cipher would thus be "DEFGHIJKLMNOPQRSTUVWXYZABC". Of course, there are many other possible keys: since there are 26! (roughly 4 \* 10^26) different arrangements of the 26 letters in the alphabet, there are 26! different keys and so 26! different keyed substitution ciphers. Nonetheless, this simple type of encryption is vulnerable to statistical attacks, as anyone who has solved cryptogram puzzles can attest. To implement a substitution cipher, the Enigma machine utilized interchangeable *rotors* with internal circuitry that mapped each letter of the alphabet to a different letter. When placed side-by-side on a *spindle*, three rotors would define an even more complex substitution pattern, with the wires inside each rotor connecting to its neighboring rotors. When the operator typed a key on the Enigma keyboard, an electrical current would flow through the wires in the rotors, forming a closed circuit and lighting up the letter specified by the rotor substitution pattern. What made the Enigma cipher especially difficult to break was that the rotors moved before each letter was encoded, essentially changing the substitution pattern each time. In particular, the rightmost rotor shifted one position on every encoding, and the other rotors shifted conditionally, based on the position of *notches* on the rotors: when a rotor shifted past a notch, not only did that rotor shift, but also the rotor to its left. Thus, the letter 'E' might get mapped to a 'Q' the first time it was encountered, but to a completely different letter each successive time. Enigma machines also used a *reflector*, which was inserted at the leftmost side of the spindle. A keypress caused current to flow right-to-left through the rotors, through the reflector, then back through the rotors, left-to-right. The reflector's substitution pattern caused the entire circuit to be symmetric, meaning that if 'E' mapped to 'Q' in a particular machine state, then 'Q' would likewise be mapped to 'E' in that same state. Thus, the same rotor settings would work to encrypt and decrypt a message. (It turns out this was a major cryptographic weakness in the design of Enigma.) The original Enigma machines used at the beginning of World War II had three rotors, known by the Roman-numeral designations I, II, and III, which could be placed on the spindle in any order (giving 3! = 6 possible combinations). In addition, each rotor was installed by the operator with a starting letter showing in a window (giving 26 possible orientations for each rotor). And there was a single standard reflector, designated B. Thus, the total number of possible initial settings for an Enigma machine was 6 \* 26 \* 26 \* 26 = 105,456. (Note: we are omitting two more Enigma mechanisms, the *ring setting* and *plugboard*. For aficionados of the Enigma, you may assume the ring setting is always 'A' for each rotor, and that no wires are installed in the plugboard.) **Exercise:** Download the [Paper Enigma Machine][paper] and follow the directions to construct a working model out of paper. Using this paper machine, you should be able to encrypt and decrypt messages using the same algorithm as an Enigma machine. To test your understanding, set the initial configuration to I-F, II-U, and III-N. That is, rotor I should be placed leftmost with the letter F on its left side aligning with the rotor setting line; rotor II should be in the middle with U at the rotor setting line, and rotor III should be rightmost with N at the rotor setting line. Then decrypt the ciphertext YNOXQ; it should produce the plaintext OCAML. [sale]: http://www.codesandciphers.org.uk/enigma/ [sale-ex]: http://www.codesandciphers.org.uk/enigma/example1.htm [sim]: http://startpad.googlecode.com/hg/labs/js/enigma/enigma-sim.html [paper]: http://mckoss.com/Crypto/Enigma.htm ## Objectives * Gain familiarity with basic OCaml language features, including built-in data types, lists, functions, and printing. * Practice writing programs in the functional style using immutable data. * Improve skills in computational thinking by learning about and modeling a mildly complex, real-world computational artifact. ## Recommended reading * [Lectures and recitations 1–4][web] * [RWO chapters 1–3][rwo] * The [CS 3110 Style Guide][style] * The policies about assignments in the [course syllabus][syllabus] [web]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/ [rwo]: https://realworldocaml.org/v1/en/html/index.html [style]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/handouts/style.html [syllabus]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/syllabus.php ## Requirements Your task in this assignment is to simulate the operation of an Enigma machine. * Your solution must provide the functions described below, with exactly those names and types. * Your solution must correctly implement the Enigma cipher, including the spindle, rotors, reflector, notches, and starting positions of the Enigma machine. A good way to check your cipher implementation is to compare its input/output behavior with the [Enigma Machine Simulator][sim] and [Paper Enigma Machine][paper]. * Your code must be written with good style and be well documented. ## Required functions You are required to adhere to these names and types. Solutions that do not adhere to them will be penalized. Recall that OCaml 4.02 is in a transient state of converting `string` to an immutable data type. Currently, the types `string` and `bytes` are synonyms. So anywhere we write `string` in the code below, it's okay if your implementation actually produces/uses `bytes`. ``` (* [cipher refl rotors notches starts s] computes the Enigma cipher, where * - [refl] is the wiring of the reflector, * - [rotors] is a list of the rotors (which must contain at least * one element), as they are installed from left to right on * the spindle, * - [notches] is a list of where the notch is on each rotor, * where the nth character of [notches] is the location of * the notch on the nth rotor in [rotors], * - [starts] is a list of the starting character for each rotor * as positioned on the spindle, where the nth character of * [starts] is the starting character for the nth rotor in * [rotors]. * - [s] is the string to be ciphered. *) val cipher : string -> string list -> char list -> char list -> string -> string (* [simulate] takes the same inputs as [cipher] but prints * a simulation of the Enigma machine at each step of the * computation. *) val simulate : string -> string list -> char list -> char list -> string -> unit ``` The functionality of your `simulate` function is entirely up to you. It need not be fancy, but it should communicate how each character is routed through the circuit, and how the rotors step. It is to your benefit to design the output of `simulate` well, because it will be used in awarding partial credit in case your `cipher` implementation is incorrect. **Message formatting:** The design of Enigma limits it to the character set 'A'-'Z'. Lower-case letters and punctuation are not supported. For this assignment, spaces should be preserved, such that a plaintext ATTACK AT DAWN might be encrypted as TXLHKV PL FCCB. Numbers will have to be spelled out. **Wiring specifications:** The wiring of a rotor is specified as a 26-character string. The standard rotor I from the 1930 Enigma I, for example, would be "EKMFLGDQVZNTOWYHXUSPAIBRCJ". That means, in starting position 'A', when current is flowing right to left, the rotor would map 'A' to 'E', 'B' to 'K', 'C' to 'M', and so on. Of course, that map changes as the rotor steps. Two other standard rotors are rotor II, which is wired "AJDKSIRUXBLHWTMCQGZNPYFVOE", and rotor III, which is wired "BDFHJLCPRTXVZNYEIWGAKMUSQO". Likewise, the wiring of a reflector is specified as a 26-character string. The standard reflector B would be "YRUHQSLDPXNGOKMIEBFZCWVJAT". That means that 'A' is mapped to 'Y', 'B' is mapped to 'R', 'C' is mapped to 'U', and so on. That map does not change. You can read about the [wiring of other historical rotors and reflectors][wiring]. Do not assume that rotors can only be from that list, though; we might test your simulator with shiny new rotors. [wiring]: http://www.cryptomuseum.com/crypto/enigma/wiring.htm **Stepping:** The intuition of stepping is that the rotors behave mostly like the numbers on an odometer: the rightmost completes a full revolution, and in so doing, causes the next rotor to its left to take a single step, and so forth. But the [notch stepping system on the Enigma][step] is a little more complicated: *Rule 1:* The rightmost rotor always takes a single step just before enciphering each character. *Rule 2:* Just before every character is enciphered, if any rotor *except the leftmost* is positioned at its notch, then that rotor and the rotor to its left steps. *Rule 3:* No rotor steps more than once per character enciphered, even if the above rules could be construed as suggesting that a rotor would step twice. Here is an example: Suppose the installed rotors are III-II-I, starting in position KDO. Then as each character is enciphered, they would step as follows: KDO, KDP, KDQ, KER, LFS, LFT, LFU, ... Here is another example: Suppose the installed rotors are III-II-I, starting in position VDP. Then as each character is enciphered, they would step as follows: VDP, VDQ, VER, WFS, WFT, ... [CLARIFICATION 09/03/15] Those examples assume that the notches on the historical rotors are as follows: I-Q, II-E, III-V. You can see that those letters are where the notches are marked on the [Paper Enigma Machine][paper]. (Technically, those letters are actually the *turnovers* of the rotors—the turnover is the letter that appears as the rotor setting when the physical notch is reached. But, for simplicity, we dispense in this assignment with the distinction between notches and turnovers.) [CLARIFICATION 09/06/15] Spaces in the message do not cause the machine to step. Some Enigma machines were built with different stepping rules; in case of discrepancy, the rules above are in effect for this assignment. Note that the [Paper Enigma Machine][paper] in its step 1 under "Operation" is somewhat vague about Rule 2, because the Paper Enigma Machine does not say what to do if there does not exist a "rotor to the left". [step]: http://users.telenet.be/d.rijmenants/en/enigmatech.htm#steppingmechanism ## What we provide In the release code on the course website you will find these files: * A template file `enigma.ml` with some function stubs. * A template file `a1.txt` for submitting your written feedback. ## What to turn in Submit files with these names on [CMS][cms]: * `enigma.ml`, containing your solution code. * `a1.txt`, containing your written feedback. [cms]: https://cms.csuglab.cornell.edu/ ## Grading issues * **Names and types:** You are required to adhere to the names and types of the required functions. If your code does not compile against the provided ~~public test file~~ [CORRECTION 09/03/15] test cases below, your solution will receive minimal credit. * **Code style:** Refer to the [CS 3110 style guide][style]. Ugly code that is functionally correct will nonetheless be penalized. Take extra time to think and find elegant solutions. * **Late submissions:** Carefully review the course policies on submission and late assignments. Verify before the deadline on CMS that you have submitted the correct version. * **Environment:** Your solution must function properly in the [3110 virtual machine][vm], which is the official grading environment for the course. [vm]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/vm.html ## Prohibited OCaml features You may not use imperative data structures, including refs, arrays, and mutable fields. Strings are allowed, but the deprecated mutable functions on them are not. You have not seen these features in lecture or recitation, so we doubt you'll be tempted. Your solutions may not require linking any additional libraries. ## Testing We encourage you to get into the habit of testing your implementation early and often. In later assignments, we will ask you to turn in test suites. But for this first assignment, we won't collect your tests. **You should nonetheless test.** Here are a few good test cases to get you started: ``` let id = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" let rotorI = "EKMFLGDQVZNTOWYHXUSPAIBRCJ" let rotorII = "AJDKSIRUXBLHWTMCQGZNPYFVOE" let rotorIII = "BDFHJLCPRTXVZNYEIWGAKMUSQO" let reflB = "YRUHQSLDPXNGOKMIEBFZCWVJAT" let () = assert ((cipher id [id] ['Q'] ['M'] "Q") = "Q") let () = assert ((cipher reflB [rotorIII] ['Q'] ['Z'] "B") = "D") let () = assert ((cipher id [id] ['Y'] ['Z'] "OCAML") = "OCAML") let () = assert ((cipher reflB [rotorI; rotorII; rotorIII] ['Q'; 'E'; 'V'] ['F'; 'U'; 'N'] "OCAML") = "YNOXQ") ``` [CORRECTION 09/06/15] The order of notches in the final test case above was incorrect with respect to the historical rotors, though that happened to have no effect on the input/output behavior of the test case. The [assertion construct][assert] `let () = assert e` evaluates expression `e`. If it evaluates to `false`, an exception is raised; otherwise, OCaml continues processing the rest of the program. [assert]: http://caml.inria.fr/pub/docs/manual-ocaml/extn.html#s%3Aassert You might want to begin collecting your test cases in a file, so that you can conveniently run an entire set of tests. ## Hints * Before designing or implementing any code, use the [Paper Enigma Machine][paper] to teach yourself how the Enigma cipher works. Make sure you can decrypt YNOXQ to OCAML as discussed in the introduction to this assignment. * There are useful library functions in the `String` and `Char` modules. * Think about implementing your solution in stages. Here is one possible progression: - First build something that works for messages that contain only a single character, with no stepping, with a single rotor that implements the wiring "ABCDEFGHIJKLMNOPQRSTUVWXYZ". - Then improve your solution to something that works for other rotor wirings. - Then to multiple rotors. - Then to multiple characters. This is the hardest part, because it requires getting all the corner cases with stepping to work correctly. Remember, stepping always occurs just before a character is enciphered. ## Karma *Karma problems* are a way of going above and beyond the call of 3110 duty. Solving them is not required, nor does it earn any explicit extra credit. But the course staff will keep track of the karma problems that you solve throughout the semester. The number of orange camels, below, is an indicator of how hard the karma problem might be. Be mindful that you still need to provide the required functions, above, so make sure your implementation of any karma functionality does not violate the specifications/names/types of those functions. <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **Enigmatic:** For additional fun, learn about and implement some of the following: * Ring settings (Ringstellung) * The plugboard (Steckerbrett) * Multiple notches on a rotor (naval rotors VI, VII, and VIII) * The rewirable reflector (Umkehrwalze D) <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **Sims:** Implement a fancier simulation using an [OCaml text interface library][text]. [text]: http://caml.inria.fr/cgi-bin/hump.en.cgi?sort=0&browse=64 <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> **Codebreaker:** Implement a tool that breaks Enigma encryptions without having the key. A brute force attack against the simple three-rotor machine used in most of the assignment is probably not that hard. Doing it for a five-rotor machine with plugboard, however, could be an interesting challenge. * * * **Acknowledgement:** Adapted from Prof. David Reed (Creighton University).