CS 312 Problem Set 4
Stream Operations

Issued: March 5, 2004
Due: March 18, 11:59pm

March 05 - Code for LZ encode (exercise 6) is now provided.
March 15 - Bugfix in RSA now included in the source distribution.

Instructions

You will do this problem set by modifying the source files in ps4.zip and submitting the program that results. The program that you submit must compile without any warnings. Programs that do not compile or compile with warnings will receive an automatic zero.

The goal of this problem set is to use the modules you implemented in Problem Set 3 to build some useful stream processing code. We will continue to expect you to maintain the same high standards of documentation that you used on PS3.

For this assignment you will be working with a partner; you may use the code of either partner in doing this problem set (but not the code of anyone else).

If you make design decisions which are not obvious, please justify these these in a coment.

This problem set is divided into four parts (A–D) , each of them involving a different data abstraction and each of them containing subparts.

Part A: Streams

Finite streams are the lazy version of lists. Lists are useful data structures, but lists are limited by the fact that they are created eagerly, meaning all computation to produce a list must be performed before the list can be used. In contrast, streams are lazy: they only generate elements as those elements are needed. They are very useful for describing input.

Abstractly, a stream is a (possibly infinite) list of elements. An operation next extracts the first element of the stream and also gives the remainder of the stream. It returns NONE if the stream has no more elements.

  (* An 'a stream is a sequence of values of type 'a. It may be
   * empty ([]), finite ([a1,...,an]), or infinite in length
   * ([a1, a2, ...]. *)
  type 'a stream
  val empty : 'a stream

  (* next(s) = SOME of possible first element of s and s with first
     element removed, or NONE if the stream is empty. *)
  val next : 'a stream -> ('a * 'a stream) option

Streams are like lazy lists. To make streams lazy, the representation type must be able to delay computation. This is achieved by defining a Cons as a function that is able to produce the rest of the list (stream) on demand:

  datatype 'a stream = Cons of unit -> ('a * 'a stream) option

The implementation of the two operations above is now trivial. A creator operation make is also defined in the stream signature:

(* make f i is a stream created using generator f and initial seed i.
   It is a sequence of values [a1,a2,...] where
   SOME (an, bn) = f(b_n-1) and b0 = i. If for any bn, f(bn) = NONE,
   then an is the last element in the stream. *)
  val make : ('b -> ('a * 'b) option) * 'b -> 'a stream

For example, we can create a stream containing all zeros: make (fn() => SOME(0,()), ())

Exercise 1: Stream Warmup

To do: Implement the following functions in the file streams.sml. Some of these have poor specifications. Please fix and formalize any specs that need it.

a) take

  (* take s n = list containing first n elements in s and s with the same
   * elements removed.  If s does not contain n elements then take s n
   * is (l, se) where l is the list of all elements in s and se is the
   * empty stream *)
  val take : 'a stream * int -> 'a list * 'a stream

b) takeAll

  (* takeAll s is a list containing all elements in s. *)
  val takeAll: 'a stream -> 'a list

c) append

  (* append s s' is a stream s'', such that s'' contains all elements in
   * s followed by all elements in s' *)
  val append : 'a stream -> 'a stream -> 'a stream

d) mapChunk

  (* mapChunk f s applies f to s to get a tuple with a stream, s', and a list.
   * The list is turned into a stream, s'', then mapChunk f s' is added
   * to the end of s'' *)
  val mapChunk : ('a stream -> 'a stream * 'b list) -> 'a stream -> 'b stream

d) foldChunk

 (* foldChunk f i x is like map chunk, but the value of type 'c is used
  * as an to pass state between sequential calls to f *)
  val foldChunk : ('a stream * 'c -> 'a stream * 'b list * 'c) -> 'c ->
    'a stream -> 'b stream

Exercise 2: Look at the Functional Input Streams

Although some operations, such a reading from a disk, are fundamentally imperative, they can be wrapped in code to make them look functional. This is useful, as it allows us to use a stream based on a stateful generation function in stream functions which expect functional arguments. Here is a function functionalize which makes a stateful function look like a functional stream:

  (* functionalize f is a stream containing
   * {f(), (f(); f()), (f(); f(); f()) ...}
   * f is only evaluated once per element generation even
   * if there are multiple copies of the same stream value *)
   val functionalize: (unit -> 'a option) -> 'a Stream.stream

For Example:

  - val f = let val i=ref 0 in fn() => (i:= !i + 1; (SOME (!i ,()))) end;
  val f = fn : unit -> (int * unit) option
  - val s = Stream.make (f,());
  val s = - : int Stream.stream
  - Stream.next s;
  val it = SOME (1,-) : (int * int Stream.stream) option
  - Stream.next s;
  val it = SOME (2,-) : (int * int Stream.stream) option
  - Stream.next s;
  val it = SOME (3,-) : (int * int Stream.stream) option
  - Stream.next (#2 (valOf it));
  val it = SOME (4,-) : (int * int Stream.stream) option

But:

  - val f' = let val i=ref 0 in fn() => (i:= !i + 1; (SOME (!i ,()))) end;
  val f' = fn : unit -> (int * unit) option
  - val s' = (Functionalize.functionalize f');
  val s' = - : (int * unit) Stream.stream
  - Stream.next s';
  val it = SOME ((1,()),-) : ((int * unit) * (int * unit) Stream.stream) option
  - Stream.next s';
  val it = SOME ((1,()),-) : ((int * unit) * (int * unit) Stream.stream) option
  - Stream.next s';
  val it = SOME ((1,()),-) : ((int * unit) * (int * unit) Stream.stream) option
  - Stream.next (#2 (valOf it));
  val it = SOME ((2,()),-) : ((int * unit) * (int * unit) Stream.stream) option

To do: Read and understand the function functionalize in impstream.sml. You will find it useful. Hint: This is not a trick question. You don't need to hand anything in.

Exercise 3: Sample streams

To familarize you with streams, you will implement some sample streams defined in samplestreams.sig. Hint: use mapChunk to filter 'a stream into 'b stream.

To do: Edit the samplestreams.sml file and implement the associated signature in samplestreams.sig.

Part B: The RSA Cipher

In a world that relies increasingly upon digital information, public key encryption and digital signatures play an important part in achieving private communication. RSA is a popular public-key encryption system. It is based on the fact that there are fast algorithms for exponentiation and for testing prime numbers—but no known fast algorithms for factoring large numbers. In this problem set you will implement a version of the RSA system. The algorithms will you implement have both a deep mathematical foundation and practical importance.

Cryptographic systems typically use keys for encryption and decryption. An encryption key is used to convert the original message (the plaintext) to coded form (the ciphertext). A corresponding decryption key is used to convert the ciphertext back to the original plaintext.

In traditional cryptographic systems, the same key is used for both encryption and decryption. Two parties can exchange coded messages only if they share a secret key. Since anyone who learned that key would be able to decode the messages, keys must be carefully guarded and transmitted only under tight security: for example, couriers handcuffed to locked, tamper-resistant briefcases!

Diffie and Hellman (1976) discovered a new approach to encryption and decryption: public key cryptography. In this approach, the encryption and decryption keys are different. Knowing the encryption key cannot help you find the decryption key. Thus you can publish your encryption key on the web, and anyone who wants to send you a secret message can use it to encode a message to send to you. You do not have to worry about key security at all, for even if everyone in the world knew your encryption key, no one could decrypt messages sent to you without knowing your decryption key, which you keep private to yourself.

The RSA system, due to Rivest, Shamir, and Adelman, is the best-known public-key cryptosystem; the security of your web browsing probably depends on it.

How RSA Works

RSA does not work with characters, but with integers. The standard ASCII representation of a single character is a 7-bit integer. We will represent each block of two characters as a 14-bit integer obtained by concatenating the ASCII representations of the two characters. We have provided code to convert character strings to lists of 14-bit integers and vice versa.

Suppose you want to obtain public and private keys in the RSA system. You select two very large prime numbers p and q. (Recall that a prime number is a positive integer greater than 1 with no divisors other than itself and 1). You then compute numbers n and m:

n = pq
m = (p−1)(q−1)

It turns out that

With very few exceptions, for almost all numbers e less than n, [em = 1] mod n

no one knows how to compute m, p, or q efficiently, even knowing n.

(Notation: [a=b] mod n means that (a mod n) = (b mod n), where a mod n is the remainder obtained when dividing a by n using ordinary integer division. Equivalently, [a=b] mod n if a−b is divisible by n. For example, [17 = 32] mod 5.)

Now you pick a number e < m relatively prime to m; that is, such that e and m have no factors in common except 1. The significance of relative primality is that e is relatively prime to m iff e is invertible mod m; that is, iff there exists a d such that [de = 1] mod m. Moreover, it is possible to compute d from e and m using Euclid's algorithm, described below. Your public key, which you can advertise to the world, is the pair (n,e). Your private key is (n,d).

Anyone who wants to send you a secret message s (represented by an integer) encrypts it by computing E(s), where

    E(s) = s^e mod n

That is, if the plaintext is represented by the number s which is less than n, the ciphertext E(s) is obtained by raising s to the power e, then taking the remainder modulo n.

The decryption process is exactly the same, except that d is used instead of e:

    D(s) = s^d mod n

The operations E and D are inverses:

    D(E(s))    = (s^e)^dmod n
                    = s^de mod n
                    = s^1+km mod n
                    = s(s^m)^k mod n
                    = s^(1)k mod n
                    = s mod n
                    = s

The integer s representing the plaintext must be less than n. That's why we break the message up into chunks. Also, this only works if s is relatively prime to n: it has no factors in common with n other than 1. If n is the product of two large primes, then all but negligibly few messages s < n satisfy this property. If by some freak chance s and n turned out not to be relatively prime, then the code would be broken; but the chances of this happening by accident are insignificantly small. In summary, to use the RSA system:

    1. pick large primes p and q;
    2. compute n = pq and m = (p−1)(q−1);
    3. choose e relatively prime to m and use this to compute d such that [de = 1] mod m;
    4. publish the pair (n,e) as your public key, but keep d, p, and q secret.

How secure is RSA? At present, the only known way to obtain d from e and n is to factor n into its prime factors p and q, then compute m and proceed as above. But no one knows how to factor large integers efficiently, despite centuries of effort by number theorists. (More precisely, no one has admittedly publicly to knowing how to do it!) Using known methods, factoring n = pq where p and q are 1000-digit primes would require decades on today's fastest supercomputers. Until someone comes up with an efficient way to factor numbers, or discovers some other way to compute d from e and n, the cryptosystem appears to be secure.

Excercise 4: Implement RSA

You have been given a partial implementation of the RSA algorithm in RSA-support.sml. You must add the functions specified below.

  (*
   * transformBlock(n, m, k) s is (s',l) where l is a byte list of length m
   * containg the first n bytes of s transformed by RSA with
   * key k.  S' is the remainder of the list.
   *)
  fun transformBlock (n: int, m: int, k:key) (s: W.word S.stream):
                                  W.word S.stream * W.word list =

  (* encryptBlock k s is (s', l') where s' is s with one
   * block of bytes removed and
   *   1) (s' is not empty) or (s' is empty and s contains blocksize elements),
   *      in this case l'  is RSA cipher text corresponding to the first
   *      blocksize elements in s. or,
   *   2) s' is empty, 0 < elements is s < blocksize, in which case
   *      l' is l''@[x] where l'' is RSA cipher text corresponding
   *      to a list of blocksize containing the elements of s followed
   *      by 0x00's.
   *   3) s' is empty and s is empty, in which case l' = []
   *      *)
  fun encryptBlock (k: key):
                    W.word S.stream -> W.word S.stream * W.word list =

To do: Complete encryptBlock and transform in RSA-support.sml.

Excercise 5: Digital Signatures

Public key encryption can be used to solve another problem of secure communication. Suppose you wish to send a message by electronic mail and sign it so that the recipient can be sure that it really came from you. What is required is some scheme for signing a message in a way that cannot be forged. This is called a digital signature.

Suppose you want to sign a digital message M in a way that convinces anyone else that the message came from you. This can be achieved by providing a signature that could only have been sent by someone with your private key. This signature is simply D(M).
Anyone who receives M and D(M) can verify that the signature corresponding to the message by applying E to it, obtaining E(D(M)) = M. Only someone in possession of the private key d could have constructed a signature with this property. If someone wanted to forge a signature to a bogus message M, they would have to compute D(M), but assuming that RSA is secure, this cannot be done without the private key d.

We can combine the two techniques as follows. Suppose Amy wants to send Bob a message that only Bob can read, and sign the message so that Bob knows it could only be from her. She encrypts the message using Bob public key. Then she signs the encrypted result using her own private key using the signature method just described. When Bob receives the message, he first uses Amy's public key to authenticate the signature, then decrypts the message using his own private key.

Bob can be sure that only someone with Amy's private key could have sent the message. Amy can be sure that only someone with Bob's private key can read the message. This is accomplished without exchanging any secret information between Bob and Amy. It's this capacity for achieving secure communication without having to worry about exchanging secret keys that makes public key cryptography such an important technique.

In real implementations of digital signatures, the whole message M is not signed; instead, the result of applying a secure one-way hash function is signed. An example of such a function is MD5.

To Do: Implement the functions encryptAndSign and unsignAndDecrypt in rsa-support.sml.

Part C: The LZ Compression Algorithm

Exercise 6: LZ Compression

A working implementation of LZ compression has been given to you, based on the Tries module that you implemented in PS3. The compression function is declared as follows:

val encode : BinIO.elem Stream.stream -> bool Stream.stream

The Lempel-Ziv compression algorithm is implemented as a functor that takes in a schema. The schema has four elements:

Starting Code Width: The number of bits taken up by the original codes when starting the compression or decompression.
Maximum Code Width: The maximum number of bits a code can have. When LZ needs a code wider than this, the algorithm should react accordingly.
Clear Dictionary: If true, the dictionary is cleared whenever the number of codes grows too large.

To do: Read and understand the function encode' and encode in lz-fn.sml Hint: This is not a trick question. You don't need to hand anything in.

Exercise 7: Resizable Arrays

As we have seen in lecture, arrays have the benefit of random access, meaning looking up a particular element takes constant time. On the flip side, arrays have the limiting property of being static in length. Java programmers get around this by using the utility library's Vector class, which allows new elements to be added. You will implement a similar imperative data abstraction, specified in the file resize_array.sig.

You will implement your resizable arrays on top of the existing SML type Array.array, an array that is not resizable. The trick is to include in the representation a reference to an Array.array that in general has some extra unused elements ready to accommodate calls to insert. When the underlying array runs out of extra elements, a new array is created that is longer than the original array and all the information is copied from one array to the other. You should grow the array in such a way that the amortized insertion time is constant.

The 'a array type and empty function have already been written for you. You will need to provide definitions for the remaining functions. You should leverage the functions in the Array structure as much as possible.

To do: Edit the file resizable-array.sml, including your implementations for all unimplemented functions.

Exercise 8: Decompression Dictionary

By now, you have seen enough data structures that you can decide for yourself exactly what data structure to use. Your decompression algorithm will rely on a structure with the following signature:

signature LZ_DICTIONARY = sig
 (* An "lzd" is a dictionary used to map codes to the
  strings they represent during the processing of a
  Lempel-Ziv-encoded stream.
  *)
  type lzd
  type code = int

 (* empty() creates and returns a dictionary *)
  val empty : unit -> lzd

 (* insert(d,c,s) inserts a mapping from c to s.
  * Returns true on successful insert, false if no changes were made *)
  val insert : lzd * code * Word8.word list -> bool

 (* lookup(d,c) is the string that d maps c to.
  * Checks: d has a mapping for c. *)
  val lookup : lzd * code -> Word8.word list

 (* size(d) is the number of mappings in d. *)
  val size : lzd -> int
end

You must devise an asymptotically efficient data structure for this problem and implement it in dictionary.sml. Each of the operations of this data structure should take O(1) amortized time; that is, any sequence of n operations should take O(n) time. You may find it to be the case that a data structure already exists, or you may find it necessary to combine several data structures. As a strategy for choosing your data structure, you should identify which functions above should have the best performance, and optimize for them.

To do: Edit the dictionary.sml file, providing a full implementation of the data structure you use for decompression.

Exercise 9: LZ Decompression

A compressed file isn't very useful if you can't decompress it, so now you get to implement decompression. Since the decompression algorithm must stay in synch with the compression algorithm, both must observe the schema. As expected, decompression is the exact opposite of compression, and has the following type:

val decode : bool Stream.stream -> BinIO.elem Stream.stream

Now that you have an appropriate dictionary abstraction, it is time to write the decode function that inverts the action of encode. Pseudo-code for the decompression algorithm is given in the handout, but as a word of caution, now is not the time to throw away your functional programming skills. Your style will be assessed carefully in your code for decode. Try to break it into separate pieces, much like the encode function.

We are providing a set of suggested test cases. They are located in the testcases.zip file. Compression and decompression should work on all of the test cases. The FW structure, located in frameworks.sml, has an instance of the LZ structure. You should use this to test different values in the schema and try different compression techniques.

HAND IN THE FOLLOWING: Edit the file lz-fn.sml, implementing the decode function.

Exercise 10: Decompression Challenge

We are giving you three mystery files that may have been encrypted or compressed using one of the following compression schemas:

(S:8, M:24, C:false)
(S:8, M:14, C:true)
(S:8, M:12, C:true)
(S:8, M:12, C:false)

One of the files has been compressed, one has been encrypted, and one has been compressed and then encrypted. The encrypted files can be decrypted with private key key.

All the files have been given the ".312" extension. Your job is to extract each of these files and determine what they are. They are all considered cross-platform, in that you should be able to open them using widely available programs. For each file, tell us what the correct file extension is and a brief (like one sentence) description of the contents. They could be image files, compressed text, or any other cross-platform file. Good luck!

To do: Edit the file exercise13.txt, providing the information requested above.

Part D: Verification

Exercise 11: Specification and Inductive Proof

Consider the following function definitions:

fun f(m,n) = if n = 1 then m else f(m*n, n-1)

(* fact(n) is n!
 * Requires: n >= 1.*)
fun fact(n) = f(1,n)

The function f has no specification. Give a specification that is strong enough to let you prove that the function fact is implemented correctly.
Using the specification you wrote in part 1, prove that fact satisfies its specification. This should be easy.
Using the techniques given in class, prove by induction that f satisfies the specification you wrote in part 1. This should be harder.

To do: turn in the solution to parts 1–3 in the file written.txt.

CS 312 Problem Set 4 Stream Operations

Issued: March 5, 2004 Due: March 18, 11:59pm