CS212 Problem Sets
Problem Set 2 - Spring 1999
RSA Encryption

Issued: Wednesday, February 10
Due: Friday, March 19, 2pm


As you can see, this problem set is quite lengthy and it may take you some time to familiarize yourself with the mathematical concepts that will be use throughout the problem set. However, we want to make it clear at the beginning that you are not responsible for all the math details. So don't feel nervous if you find yourself completely lost about the math. We only expect a brief understanding so that you know what's going on.

For this assignment, you can work in pairs. If you choose to work in pairs, you must only submit one copy of your solution using the e-mail system. You should know how to do that by now.

If you have trouble with the course software or are generally stuck, come to our office hours or consulting hours. Alternatively, post a question to the newsgroup cornell.class.cs212. As a last resort, send email to cs212@cs.cornell.edu.


Note: There are two written exercises in this problem set. They should be written as text and included in the file that you are working on as a multi-line comment. Then, you will submit it along with the programming exercises.
For example

  #|       CS212 - Problem set #1
           Written by Eli Barzilay

  ;;; Question #1 - Induction

  We should show that 1^3 + 2^3 + 3^3 + ... + n^3 = (1+2+3+...+n)^2

  Proof:  Blah blah blah blah blah blah blah blah blah blah blah blah
    blah blah blah blah blah blah blah blah blah blah blah blah blah
    blah blah blah.
    Therefore blah blah blah blah blah, QED.

  ;;; Question #2 - Substitution Model
  ...

  |#

  ;;; Programming exercise #1:
  (define (RSA-decrypt-list ...)
    ...)
  ...

Written Exercises

1. Induction

Prove that 13 + 23 + 33 + ... + n3 = (1+2+3+...+n)2 by induction.

Use x^y as the notation for the exponent operator (xy), as shown in the example above.

2. Functions as parameters, substitution model

Consider the following function:

  (define foo
    (lambda (f g)
      (lambda (n)
        ((if (even? n) f g) n))))

  (define bar
    (foo (lambda (x) (/ x 2))
         (lambda (x) (+ 1 (* 3 x)))))

Using the substitution model, show what happens when

   (bar 3)

is evaluated.


Public Key Cryptography

Public key encryption and digital signatures play an important part in achieving private communication in a world that relies increasingly upon digital information. The fact that there are fast algorithms for exponentiation and for testing prime numbers lies at the root of RSA, a popular method for implementing public key encryption. In this problem set you will implement a version of the RSA system. By doing so, you will gain experience with some algorithms that, although simple, have a deep mathematical foundation and great practical importance.

Cryptographic systems typically use keys for encryption and decryption. An encryption key is used to convert the original message (the plaintext) to coded form (the ciphertext). A corresponding decryption key is used to convert the ciphertext back to the original plaintext.

In traditional cryptographic systems, the same key is used for both encryption and decryption. We call that a symmetric or private key cryptographic system. Two parties can exchange coded messages only if they share a secret key. Since anyone who learned that key would be able to decode the messages, keys must be carefully guarded and transmitted only under tight security.

Diffie and Hellman (1976) discovered a new approach to encryption and decryption: public key cryptography. In this asymmetric approach, the encryption and decryption keys are different but interrelated. The two keys are specially picked so that knowing the encryption key cannot help you much to find the decryption key. Thus you can publish your encryption key, say on the web, and anyone who wants to send you secret messages can use it to encode a message to send to you. You do not have to worry about key security at all, because even if everyone in the world knew your encryption key, no one could decrypt messages sent to you without knowing your decryption key, which you keep private to yourself.

A popular method for implementing this scheme is due to Rivest, Shamir, and Adelman and is known as the RSA system.

How RSA works

RSA does not work with characters, but with integers. The ASCII standard representation of a single character is a 7-bit integer. We will represent each block of two characters as a 14-bit integer obtained by concatenating the ASCII representations of the two characters. Code have been provided to convert character strings to lists of 14-bit integers and vice versa.

In the RSA scheme, you select two large prime numbers p and q. Recall that a prime number is a positive integer with no divisors other than itself and 1. You then define

  n = pq
  m = (p-1)(q-1)

The significance of these definitions is that

(Notation: [a=b] mod m means that a mod m = b mod m, where a mod m is the remainder obtained when dividing a by m using ordinary integer division; equivalently, a-b is divisible by m. For example, [17 = 22] mod 5.)

Now you pick a number e < m relatively prime to m, i.e., such that e and m have no factors in common except 1; then compute a number d such that [de = 1] mod m, i.e. de = km+1 for some k. We'll see how to do this below. Your public key, which you can advertise to the world, is the pair (n,e). Your private key is (n,d). Anyone who wants to send you a secret message s (represented by an integer) encrypts it by computing E(s), where

  E(s) = se mod n

That is, if the plaintext is represented by the number s, then the ciphertext E(s) is obtained by raising s to the power e, then taking the remainder modulo n. The decryption process is exactly the same, except that d is used instead of e:

  D(s) = sd mod n

Why does that work? In fact, the functions E(s) and D(s) are inverses.

  D(E(s)) = (se)d mod n
          = sde mod n
          = s1+km mod n 
          = s(sm)k mod n 
          = s(1)k mod n 
          = s mod n 
          = s

The integer s representing the plaintext must be less than n. That's why we break the message up into character chunks. Also, this only works if s is relatively prime to n, i.e., has no factors in common with n other than 1. If n is the product of two large primes, then all but negligibly few messages s < n satisfy this property. If by some freak chance s and n turned out not to be relatively prime, then the code would be broken; but the chances of this happening by accident are insignificantly small. It is easy to compute d efficiently from e and m. We show how below. Thus, to use the RSA system:

  1. pick large primes p and q;
  2. compute n = pq and m = (p-1)(q-1);
  3. choose e and use this to compute d such that [de = 1] mod m;
  4. publish the pair (n,e) as your public key, but keep d secret to yourself

How secure is RSA? At present, the only known way to obtain d from e and n is to factor n into its prime factors p and q, then compute m and proceed as above. But no one knows how to factor large integers efficiently, despite centuries of effort by number theorists. Using known methods, factoring n = pq where p and q are 1000-digit primes would require years on today's fastest supercomputers. Until someone comes up with an efficient way to factor, or discovers some other way to compute d from e and n, the system is reasonably secure for all practical purposes.

Digital Signatures

Public key encryption can be used to solve another problem of secure communication. Suppose you wish to send a message by electronic mail and sign it so that the recipient can be sure that it really came from you. What is required is some scheme for signing a message in a way that cannot be forged. This is called a digital signature.

This can be done as follows. Apply a compression function (also called a hash function) that transforms the message to a single, relatively small number h. In general, there will be many messages that produce the same hash value. Now transform the hash value h using your private key to give D(h). This is your digital signature, which you transmit along with the message. Anyone who receives the message can authenticate the signature by transforming it using your public key, giving E(D(h)) = h, and checking that this gives the same result as applying the compression function to the message.

Anyone who wanted to forge a message claiming to be from you must produce the number D(h). Anyone can compute the hash value h of the message, since the compression function is public. But as long as you are the only one who knows your private key, only you can produce D(h).

We can combine the two techniques as follows. Suppose Hillary wants to send Bill a message that only Bill can read, and sign the message so that Bill knows it could only be from her. She encrypts the message using Bill's public key. Then she signs the encrypted result using her own private key using the signature method just described. When Bill receives the message, he first uses Hillary's public key to authenticate the signature, then decrypts the message using his own private key.

Bill can be sure that only someone with Hillary's private key could have sent the message. Hillary can be sure that only someone with Bill's private key can read the message. This is accomplished without exchanging any secret information between Bill and Hillary. It's this capacity for achieving secure communication without having to worry about exchanging secret keys that makes public key cryptography such an important technique.

Digital Time Stamping

One problem that needs to be solved before electronic documents can be considered legally binding is the need to verify the time at which the document was created. This is called a digital time stamp. It is much like using a public notary, but in fact can be easier, cheaper, and more secure. Typically, a digital time stamper should be a neutral party such as the government (hmmm...). One means of digital time stamping is to have the owner of the document calculate the hash value of the document and send this to the time-stamper. The time-stamper then uses its own private key to encrypt a message containing that hash value and the time that the request was received. This is the time-stamped message. The time-stamper then returns the time-stamped message to the sender, who can append it to the document. If at a later date someone challenges the date when the document was written, they can decrypt the time stamp using the public key of the time-stamper and verify that the hash value of the document matches the hash value included in the time stamp. The security of the system rests in the security of the time-stamper's private key; a person who gains access to the private key of the time-stamper would be able to forge another date.

Added security can be built in by getting a document time-stamped by multiple time-stamping authorities. A person would then have to obtain the private key of all the authorities to be able to forge a date. This feature would probably be implemented by a hierarchy of time-stampers; that is, when one sends a request for a time-stamp, the time-stamper would include a time-stamp from another authority, and so on. This chain can only be forged if the forger know the private keys of all the authorities on the chain. Digital time-stamping also has the added advantage that the sender does not need to reveal the whole document to the time-stamper, as one does with a public notary, because only the hash value is sent. Finally, altering the document would be very difficult, since the time-stamp would be invalid for any document that did not have the same hash value.

To illustrate a use for digital time-stamping, let's continue with the example of Bill and Hillary. After exchanging a series of torrid secret email messages for a couple of hours, Hillary and Bill decide to get married the next day. Hillary wants to have a prenuptial agreement because she makes much more money than Bill. Hillary's lawyer, who is vacationing in Europe, cannot give them a physical copy of the prenuptial agreement to sign, so she emails them an online version. After they both read the document, they sign it using the method discussed in the previous section. They calculate the hash value of the document with signatures and have it time-stamped. A few years later when they file for divorce, Bill claims that Hillary got access to his private key and forged the document after they got married. Hillary produces the time-stamp, proving that the prenuptial agreement was signed before they were married. Bill's only recourse now is to try to claim that Hillary knew his digital signature back before they were married.

Implementing RSA

We'll assume that an RSA key is represented as a list of two integers, the modulus and the exponent (the list (n e) for a public key and (n d) for a private key). We have provided a very simple abstraction for this structure. Keys are implemented as two-element lists. We also defined creator and accessor functions for the new type, make-key, key-modulus and key-exponent. The creator make-key is a function that takes two integers, modulus and an exponent, and returns a list with the given modulus and exponent. The accessor key-modulus is a function that takes a key pair and returns the modulus, and similarly for key-exponent. Note that it is not a good idea to rely on the internal implementation of these objects. Here is how this looks like:

  (define key-modulus  first)
  (define key-exponent second)
  (define (make-key modulus exponent)
    (list modulus exponent))

The basic RSA transformation is then

  (define (RSA-transform number key)
    (expmod number (key-exponent key) (key-modulus key)))

which implements both the encryption and decryption operations, depending on whether the key is public or private.

To generate RSA keys, we first need a way to generate primes. The most straightforward way is to pick a random number in some desired range and start testing successive numbers from there until we find a prime. The following procedure starts searching at a randomly chosen integer between smallest and (+ smallest range):

  (define (choose-prime smallest range)
    (let ((start (+ smallest (rand range))))
      (search-for-prime
        (if (even? start)
          (+ start 1)
          start))))

  (define (search-for-prime guess)
    (if (fast-prime? guess)
      guess
      (search-for-prime (+ guess 2))))

The test for primality is the Fermat test. This is a probabilistic method, which with high probability determines whether or not n is prime. It is based on Fermat's Theorem, which states that if n is prime, then all numbers a in the range 0 < a < n satisfy [an-1 = 1] mod n; and if n is not prime, then at most half do (with a relatively small set of exceptions called Carmichael numbers, which are composite numbers that look prime as far as Fermat's test is concerned. There are so few of these we won't worry about hitting one).

  (define (fermat-test n a)
    (= (expmod a n n) a))

  (define (fast-prime? n)
    (if (> n 7)
      (and (fermat-test n 2) (fermat-test n 3)
           (fermat-test n 5) (fermat-test n 7))
      (or (= n 2) (= n 3) (= n 5) (= n 7))))

Now we can generate a public RSA key and matching private key. We'll represent this as a list of two keys, along with functions make-key-pair, public-key and private-key to create and access parts of a pair of keys:

  (define public-key  first)
  (define private-key second)
  (define (make-key-pair public private)
    (list public private))

The following procedure generates an RSA key pair. It picks primes p and q that are in the range from 2r to 2*2r so that n = pq will be in the range 22r to 22r+2. In order to encode two characters per number, we need primes larger than 214. In this problem set, we're using small values of n (218 to 220) because we want you to play around with cracking an RSA system. By starting with larger random numbers, you can use the same method to produce a system that really is secure.

  (define (generate-RSA-key-pair r)
    (let* ((size (expt 2 r))
           (p (choose-prime size size))
           (q (choose-prime size size)))
      (if (= p q)       ; check that we haven't chosen the same prime twice
        (generate-RSA-key-pair r) ; VERY unlikely
        (let* ((n (* p q))
               (m (* (- p 1) (- q 1)))
               (z (select-keys m))
               (e (first z))
               (d (second z)))
          (make-key-pair (make-key n e) (make-key n d))))))

Computing the Keys

The public and private keys e and d must satisfy:

[de = 1] mod m

i.e., e and d must be multiplicative inverses mod m. One can show that a solution to this equation exists if and only if the greatest common divisor (gcd) of e and m is 1. We can pick e randomly in the range 0 < e < m, then calculate the gcd of e and m using the Euclidean algorithm. If the gcd is not 1, we pick again. The number d falls out as a by-product of this computation.

The Euclidean algorithm is based on the fact that for any positive integers k and m, there is a unique quotient q and remainder r such that


  k = qm + r
  0 <= r < m

Moreover, the gcd of k and m, call it g, is the same as the gcd of m and r. This gives us a recursive means of calculating g and integers s and t such that sk + tm = g: recursively compute u and v such that um + vr = g; then observe


  g = um + vr
    = um + v(k - qm)
    = vk + (u - vq)m

so we can take s = v and t = u - vq.

  (define (Euclid m n)
    (if (zero? n)
      (list 1 0 m)
      (let* ((q (floor (/ m n)))
             (r (modulo m n))
             (z (Euclid n r))
             (u (first z))
             (v (second z))
             (g (third z)))
        (list v (- u (* q v)) g))))

Thus the call (euclid e m) can be used to test whether the gcd of e and m is 1. If so, it also returns s and t such that se + tm = 1. The desired multiplicative inverse d of e is s mod m (we have to reduce mod m because the s returned by euclid might be negative).

Encrypting and Decrypting Messages

Finally, to use RSA, we need some standard way to transform between strings of characters and numbers. We have provided procedures that convert a string to a list of numbers and vice versa.

The code for the problem set includes the conversion procedures string->intlist and intlist->string that convert between character strings and lists of integers. An integer between 0 and 214 encodes 2 successive characters from the message.

  (string->intlist "This is a string.")
    => (13396 14825 13472 4211 4193 14963 13554 13294 46)
  (intlist->string '(13396 14825 13472 4211 4193 14963 13554 13294 46))
    => "This is a string."

The code for these two procedures is included with the problem set code, but you are not responsible for it. You may want to look at it if you are interested in how character strings can be manipulated in Swindle.

To encrypt a message, we transform the message into a list of numbers and convert the list of numbers using the RSA process:

  (define (RSA-encrypt string key)
    (RSA-encrypt-list (string->intlist string) key))

You might guess that the right way to encode the list of numbers would be to encode each number in the list separately. But this doesn't work well (it makes it much easier to crack the code). Instead, we encrypt the last number in the list, add that to the second-to-last number (modulo n) and encrypt the result, add that to the third-to-last number and encrypt the result, and so on, so that each number in the resulting encrypted list will depend upon all the numbers after it in the list.

Suppose that the unencrypted message is of the form

A B C

and let [x] represent the RSA-transform of x. Then what all of this means is that the encryption of this message is

[A+[B+[C]]] [B+[C]] [C]

This is implemented as follows:

  (define (RSA-encrypt-list intlist key)
    (if (null? intlist)
      '()
      (let* ((encrypted-tail
              (RSA-encrypt-list (tail intlist) key))
             (encrypted-head
              (RSA-transform (if (null? encrypted-tail)
                               (head intlist)
                               (modulo (+ (head intlist) (head encrypted-tail))
                                       (key-modulus key)))
                             key)))
        (cons encrypted-head encrypted-tail))))

To decrypt the message, you have to remove the boxes and do some relatively simple arithmetic. We'll leave it to you to implement the analogous RSA-decrypt-list procedure that does this. Using that, we have:

  (define (RSA-decrypt intlist key)
    (intlist->string (RSA-decrypt-list intlist key)))

Finally, to generate digital signatures for encrypted messages, we need a standard compression function. In this problem set, we'll simply add the integers modulo 218. (In practice, people use more complicated compression schemes than this.)

  (define (compress intlist)
    (letrec ((add-loop
              (lambda (l)
                (if (null? l)
                  0
                  (+ (head l) (add-loop (tail l)))))))
      (modulo (add-loop intlist) (expt 2 18))))

Programming Exercises

In you program, you should load ps2.ss library into DrSwindle a different method: all you have to do is add this statement as the first line in your file: "(load ps2)".

To test the code, evaluate

  (define result1
    (RSA-encrypt "This is a test message." test-public-key1))

Then result1 should be the list

  (748975 486423 936496 574177 690867 66704
   610366 793797 270230 709111 792539 85020)

We have generated sample RSA key pairs test-key-pair1 and test-key-pair2 for you to test your code with. Keep in mind that punctuation and case in the string matters.


Programming Exercise 1

The code is missing one of the procedures required to decrypt messages, RSA-decrypt-list. Implement this procedure. It should take as arguments a list of integers to decode and a decoding key and return a list of integers, undoing the transformation implemented by RSA-encrypt-list. (Hint: This procedure is similar in form to RSA-encrypt-list. If you find yourself doing something much more complicated, then you are barking up the wrong tree. Ask for help if necessary.)

To test your procedure, try

  (RSA-decrypt-list result1 test-private-key1)

You should obtain the result

  (13396 14825 13472 4211 4193 13044 14963 13984 14821 12531 13031 46)

If that works, then you should be able to evaluate

  (RSA-decrypt result1 test-private-key1)

to obtain the original test message.

For this problem, turn in a listing of your procedure, the sample encryption and decryption of the test message, and a sample encryption and decryption (using test-key-pair1 and test-key-pair2) of some messages of your choice. Remember that samples are to be pasted in the code as comments.


Programming Exercise 2

In this exercise, you will implement the function for signing and encrypting messages described above. Part A asks you to create a simple data type interface to store signed messages. In part B, you will write a function to encrypt and sign a message, and also write a function to reverse that process (i.e., to authenticate a signed message and decrypt it).

Part A

The first step in adding signed messages is to create the data type for signed messages. For this question, you must use higher-order functions to create signed messages objects. You need to write the functions make-signed-message, signed-message-message, and signed-message-signature to fulfill the following specification:

  (signed-message-message (make-signed-message msg sig)) ==> msg
  (signed-message-signature (make-signed-message msg sig)) ==> sig

The implementation of make-signed-message should be a function taking a message (which is a list) and a signature (which is just an integer) and returning a signed-message object which is actually a function. Additionally, signed-message-message and signed-message-signature should take as input a signed-message.

Your function make-signed-message can be implemented in any way that you please, provided you can use it to write signed-message-message and signed-message-signature to satisfy the specification, and it must return a function object - so when you evaluate such an object, DrSwindle should respond with "#<procedure>".

For this problem, you should hand in your listings for the three signed-message functions, as well as output showing that your functions satisfy the above specification.

Now that you have defined signed messages, evaluate:

  (define secret-message
    (make-signed-message secret secret-signature))

to create the secret message. secret and secret-signature are both defined in the ps2.ss file. Later on you will be cracking the RSA system and figuring out what the message is.

Part B

Define encrypt-and-sign, a procedure that takes as arguments a message to be encrypted and signed, the sender's private key, and the recipient's public key. The procedure should encrypt the message, compute a digital signature for it, and combine these to produce a signed message.

As a test, try

  (define result
    (encrypt-and-sign "Test message from user 1 to user 2"
      test-private-key1
      test-public-key2))

You should obtain a signed message whose message part is

  (458337 7904 323936 12133 215071 196939 545879 155675 21350
   145763 372648 243075 469305 414209 479082 489696 200033)

and whose signature part is 383602.

Now implement the inverse transformation authenticate-and-decrypt, which takes as arguments the received signed message, the sender's public key, and the recipient's private key. If the signature is authentic, the procedure should produce the decrypted message. If the signature is not authentic, the procedure should indicate this. Test your procedures by trying

  (authenticate-and-decrypt result test-public-key1 test-private-key2)

to recover the original message. For this problem turn in a listing of your procedures together with a demonstration that they work. Don't forget to demonstrate that they catch fraudulent signatures. You can use the two test keys for this.


Programming Exercise 3

For the next exercise we will need the procedure smallest-divisor, which given an integer n finds the smallest integer that divides n with no remainder. You should write a procedure that finds the smallest divisor by trying all the integers that could divide n. You can save time by checking various cases. For example, what is the largest integer that needs to be checked? What other numbers do not need to be considered (e.g., n is not of the form 6k+1 or 6k-1)?

The procedure smallest-divisor should be completely stand-alone; that is to say, it should take n as an input parameter and not assume anything about n except that it is a positive integer.

For this problem turn in a listing of your smallest-divisor procedure and some examples of it working.

Also, write a procedure prime-decompose that returns the list of prime factors given positive integer n. The list may contain repeated prime factors. For example, (prime-decompose 12) will give (2 2 3) and (prime-decompose 17) will give (17). You may find smallest-divisor useful. Be efficient. Make sure your code terminates as soon as possible. Show the prime factorization of 21682369144645089107 (representable by 65 bits). (Hint: we assure you that it contains at most four factors only and it shouldn't take a very long time to factorize it (6 sec. on a PentiumII-300). Also, think about how should your code decide when to quit factoring once you have some factor(s)?)


Programming Exercise 4

You now have a basic implementation of an RSA cryptosystem, complete with facilities for encryption, decryption, digital signatures and signature authentication, and generation of new keys. Since we have used such small primes to generate keys, you should also be able to crack the system. In order to crack an RSA system, recall you must factor the modulus n into its component prime factors p and q. You can do this using the procedure smallest-divisor. Note that you only need to find one prime divisor p; the other divisor is q = n/p. Write a procedure crack-RSA that, given a public key, returns the associated private key. Test your procedure using the pairs test-key-pair1 and test-key-pair2 to show that it generates the correct private keys, given the public keys. (Note: cracking these numbers is really fast.)

For this problem turn in a listing of your procedure, together with demonstrations that it works.


Programming Exercise 5

Use your crack-RSA and authenticate-and-decrypt procedures to determine who sent the signed message in secret-message, who the message was intended for and what the content of the message is. The message is in "secret", the signature is in "secret-signature" and the public keys for various possible senders and receivers are at the end of the file ps2.ss (greg, david, eli, tony...).

For this problem turn in the results of using these procedures to decrypt the message.


Source: W. Diffie and M. Hellman, "New directions in cryptography," IEEE Transactions on Information Theory, IT-22:6, 1976, pp. 644-654.


Real Life

In real life, we seldom do encryption/decryption this way because RSA, as well as other secure public-key system, are quite slow compared to many private-key crypto-systems. So how can we get the best of both worlds? Softwares like PGP actually use a public key algorithm to encrypt a session-key that is sent with each message. That session-key is actually the key for a private key system. The session-key is small (compared to a general message), and so you save a lot of time by using a fast algorithm for a larger part but still get the roughly same level of security. Another difference between our system and reality is that signatures are usually embedded in the encrypted message, i.e. we sign the original message, then we encrypt the signed message.

Let's take a look at message signing. How should we choose a good algorithm to compute the signature? Or we could ask what quality makes a signature computing algorithm good? Should it reveal any information about the original message? A nice property would be that the signature changes half (on average) of its bits if one bit in the original message flipped. However, we didn't implement that in our system.

It may seem crazy to factorize such a big number in part 3. However, that number is only 65 bit long. What do we use in real life? Numbers like 1024 and 2048 are very common. How long does it take to factorize such a number by brute-force? Each bit roughly doubles the time required... And if you are very interested in this kind of brute-force code breaking, try visit distributed.net and see their RC5-64 project.