CS 5430: Block Cipher Modes and Asymmetric-key Encryption

# Block Cipher Modes and Asymmetric-key Encryption We saw two limitations of block ciphers: * The message size is required to be exactly the block length. * Every pair of principals must share a key, leading to an \\(O(n^2)\\) key distribution problem. We'll now address those limitations. # Block cipher modes If block ciphers work only on fixed length blocks, how can we send longer messages than the block length? A *block cipher mode* is an algorithm that uses a fixed-length block cipher to send an arbitrary-length message. **Strawman idea:** chunk message into blocks; encrypt each block individually. Ciphertext block number \\(i\\), written \\(c_i\\), is thus Enc(\\(m_i\\); k), where \\(m_i\\) is plaintext block number \\(i\\). This algorithm is called *electronic codebook mode* (ECB). ECB is a bad idea that unfortunately gets invented over and over again, especially by students of crypto. Why is it bad? Because any two blocks that are same in plaintext will the be same in ciphertext. (Wikipedia has a nice [graphical illustration][ecb] of how ECB fails to provide confidentiality.) **Do not use ECB.** Unfortunately, it is still the default in some crypto libraries, but you don't have to settle for the default. [ecb]: https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Electronic_Codebook_.28ECB.29 One good block cipher modes is *cipher block chaining* (CBC). With it, every ciphertext block depends on **all** previous ciphertext blocks, which avoids intra-message repetition problems like we observed with ECB. Another good block cipher mode is *counter mode* (CTR), which includes an ever-increasing counter with each block, thus ensuring that no repetitions are possible. ## Nonces But what about inter-message repetitions? That is, suppose the same message is sent twice, encrypted under the same key? Neither block chaining nor counters will conceal that fact. So CBC and CTR both make use of an additional parameter (besides the message and the key): a *nonce*, that is, a <u>n</u>umber used <u>once</u>. The nonce influences the encryption of every block, thus avoiding inter-message repetition problems. CBC has a special name for the nonce, calling it the *initialization vector* (IV). Nonces show up a lot in cryptography. A nonce must always be - **Unique,** meaning that it has never been used before in the lifetime of the system. (A synonym for "unique" is "fresh".) Nonces may also be - **Unpredictable,** meaning that it isn't possible to predict the next nonce, even given knowledge of all the nonces that have been used for far in the lifetime of a system. Nonces can come from several sources: - Keeping a **counter** is simple to implement, but requires principals to keep state. Counters are always unique—unless the counter overflows. (Which can happen!) Counters are highly predictable. - Using a **clock** value amount to keeping a counter that tracks time. If the clock can roll back, its value might not be unique. Clocks are highly predictable. - Generating **random** nonces with a random number generator (RNG) yields unpredictability, if a good RNG is used. Such nonces are not necessarily unique, if random numbers are drawn from small space. But by making the space large enough, the probability of *collisions* can be made very low. ## Generating random numbers How can software generate good, cryptographically strong random numbers? Frankly, it's a black art. Many libraries provide cryptographically weak random numbers designed to pass certain statistical tests, rather than to be unpredictable by an attacker. **Example: Java.** In Java, use `java.security.SecureRandom`. Do not use `java.util.Random`, and do not use `Math.random()`, which itself uses `java.util.Random`. These latter two are [predictable][random-attacks]. [random-attacks]: http://www.google.com/#q=attacks+on+java.util.Random ## Padding When using a block cipher mode, what should the mode do if the plaintext message length is not exactly a multiple of the block cipher's block length? Should the mode pad out the last block with 0's? No: there's no way to unambiguously remove the padding. **PKCS5 padding:** Suppose B is the number of bytes that need to be added to the final plaintext block to fill it out completely. Then pad with B copies of the byte representing integer B. In the worst case, when the plaintext block is already filled, this requires adding one extra block to the message. # Asymmetric-key encryption There's a big problem with the encryption schemes we've examined so far: the shared keys have to be distributed. For each pair of principals who want to communicate, a key needs to be shared. If there are \\(n\\) principals, that's \\(O(n^2)\\) keys. That sharing costs time and money. This problem motivated the invention of another kind of encryption scheme: *asymmetric key* or *public key* cryptography. RSA is the most famous example. The name "asymmetric" comes from the fact that different keys are used for encryption vs. decryption. In *symmetric* schemes like AES, the same key is used both for encryption and decryption. In an asymmetric-key cryptosystem, every principal has its own *key pair*, comprising a - **public key,** which is published for the world to see, and a - **private key,** which is kept secret and never shared with anyone. *N.B. Our usually fastidious terminology breaks down here. Some people call symmetric schemes "secret-key schemes", even though in both symmetric and asymmetric schemes there is a key that is kept secret. And "private" key here doesn't necessarily mean that the key is personally-identifying information.* With asymmetric-key schemes, key distribution becomes much easier. We need only to publish a "phonebook" of public keys, which contains just \\(O(n)\\) keys. Thus we reduce from a quadratic problem to a linear problem. To send a message, a principal looks up the public key of the receipient in that phonebook and encrypts using it: ``` 0. B: (K_B, k_B) = Gen(len) 1. A: c = Enc(m; K_B) 2. A -> B: c 3. B: m = Dec(c; k_B) ``` Note how we write upper-case K for public keys and lower-case k for private keys. ## Big integers Asymmetric encryption schemes are usually implemented in terms of really big integers—not the byte arrays that symmetric schemes use. The integers used for asymmetric encryption are far too big to fit in a standard `int` data type. This use of big integers might seem like a minor implementation detail, but it's important. The minor reason it's important is that the maximum size value you can encrypt is always bounded by the key size. The major reason it's important is that computation on big integers is much slower than computation on byte arrays. How much? In Java, anywhere from one to three orders of magnitude, based on simple experiments we did comparing AES to RSA in a past semester of this course. ## Padding for asymmetric encryption Since asymmetric schemes use big integers, not byte arrays, padding works differently than for symmetric encryption. With RSA, the common practice is to use a padding function called OAEP: optimal asymmetric encryption padding. OAEP actually does much more than just padding, despite its name. It even takes extra precautions to improve the security of plain RSA encryption. ## Block modes for asymmetric encryption Since asymmetric encryption limits the maximum size of the plaintext, you might think that we should use block modes with asymmetric encryption algorithms to encrypt arbitrary-length messages. In fact, this can be done. You could use CBC or CTR. You still should not use ECB, for the same reason as before. In practice, though, block modes don't get used with asymmetric encryption, because encrypting many blocks with an asymmetric scheme would be really slow. Instead, the typical practice is to use a combination of both asymmetric and symmetric encryption, as discussed next. ## Hybrid encryption To efficiently encrypt a long message, we use a mash-up of asymmetric and symmetric encryption called *hybrid encryption*. Hybrid encryption uses a symmetric encryption scheme (Gen_S, Enc_S, Dec_S) and an asymmetric scheme (Gen_A, Enc_A, Dec_A), as well as a block cipher mode if necessary. ``` 0. B: (K_B, k_B) = Gen_A(len_A) 1. A: k_s = Gen_S(len_S) c1 = Enc_A(k_s; K_B) c2 = Enc_S(m; k_s) // using a block cipher mode 2. A -> B: c1, c2 3. B: k_s = Dec_A(c1; k_B) m = Dec_S(c2; k_s) ``` Key k_s is an example of a *session key*: a key that is for a limited time then discarded. If the session key is later compromised, only those messages it protected are vulnerable—unlike if a long-term symmetric key were used. The session key in hybrid encryption is valid only for one encryption from Alice to Bob; it shouldn't be reused for future encryptions from Alice to Bob, and Bob shouldn't use it to encrypt messages to Alice. ## Exercises 1. Do some research to find out how CBC mode encryption works. (See, e.g., section 7.2.2 of the *Handbook of Applied Cryptography*, or section 4.3 of *Cryptography Engineering*.) Then answer the following questions: - The initialization vector for CBC does not need to be secret—in fact, it is necessarily revealed to the Dolev–Yao adversary. Why does this not compromise the secrecy of the plaintext? - CBC mode is *self-synchronizing*. What does that mean? 2. Distinguish between a *true random number generator* (TRNG), a *pseudorandom number generator* (PRNG), and a *cryptographically strong/secure PRNG*. 3. Choose a PRNG, perhaps Fortuna or `/dev/random`, that endeavors to provide cryptographic strength. Do some research to find out how it works internally: How is it initialized? How does it collect entropy? How does it produce output? How does it handle rebooting? 4. Both RSA and Elgamal encryption are *malleable*, in that they are *homomorphic*. Learn what that means, and identify what the homomorphism is in each scheme. 5. An RSA public key is actually a pair of an encryption exponent *e* and a modulus *n*. Although every principal is supposed to have its own unique modulus, it sometimes happens that many principals will have the same exponent. Find out the reasons why, and the arguments against it. 6. What is *elliptic curve cryptography*, and what are its benefits?