You will do this problem set by modifying the source files in ps4.zip and submitting the program that results. The program that you submit must compile without any warnings. Programs that do not compile or compile with warnings will receive an automatic zero.
The goal of this problem set is to use the modules you implemented in Problem Set 3 to build some useful stream processing code. We will continue to expect you to maintain the same high standards of documentation that you used on PS3.
For this assignment you will be working with a partner; you may use the code of either partner in doing this problem set (but not the code of anyone else).
If you make design decisions which are not obvious, please justify these these in a coment.
This problem set is divided into four parts (A–D) , each of them involving a different data abstraction and each of them containing subparts.
Finite streams are the lazy version of lists. Lists are useful data structures, but lists are limited by the fact that they are created eagerly, meaning all computation to produce a list must be performed before the list can be used. In contrast, streams are lazy: they only generate elements as those elements are needed. They are very useful for describing input.
Abstractly, a stream is a (possibly infinite) list of elements. An operationnext extracts the first
element of the stream and also gives the remainder of the
stream. It returns NONE if the stream has no more elements.
(* An 'a stream is a sequence of values of type 'a. It may be * empty ([]), finite ([a1,...,an]), or infinite in length * ([a1, a2, ...]. *) type 'a stream val empty : 'a stream (* next(s) = SOME of possible first element of s and s with first element removed, or NONE if the stream is empty. *) val next : 'a stream -> ('a * 'a stream) option
Streams are like lazy lists. To make streams lazy, the representation type must be able to delay computation. This is achieved by defining a Cons as a function that is able to produce the rest of the list (stream) on demand:
datatype 'a stream = Cons of unit -> ('a * 'a stream) option
The implementation of the two operations above is now trivial. A creator
operation make is also defined in the stream signature:
(* make f i is a stream created using generator f and initial seed i. It is a sequence of values [a1,a2,...] where SOME (an, bn) = f(b_n-1) and b0 = i. If for any bn, f(bn) = NONE, then an is the last element in the stream. *) val make : ('b -> ('a * 'b) option) * 'b -> 'a stream
For example, we can create a stream containing all zeros: make (fn()
=> SOME(0,()), ())
To do: Implement the following functions in the file streams.sml. Some of these have poor specifications. Please fix and formalize any specs that need it.
(* take s n = list containing first n elements in s and s with the same * elements removed. If s does not contain n elements then take s n * is (l, se) where l is the list of all elements in s and se is the * empty stream *)valtake : 'a stream * int -> 'a list * 'a stream
(* takeAll s is a list containing all elements in s. *)valtakeAll: 'a stream -> 'a list
(* append s s' is a stream s'', such that s'' contains all elements in * s followed by all elements in s' *)valappend : 'a stream -> 'a stream -> 'a stream
(* mapChunk f s applies f to s to get a tuple with a stream, s', and a list. * The list is turned into a stream, s'', then mapChunk f s' is added * to the end of s'' *)valmapChunk : ('a stream -> 'a stream * 'b list) -> 'a stream -> 'b stream
(* foldChunk f i x is like map chunk, but the value of type 'c is used * as an to pass state between sequential calls to f *)valfoldChunk : ('a stream * 'c -> 'a stream * 'b list * 'c) -> 'c -> 'a stream -> 'b stream
functionalize
which makes a stateful function look like a functional stream:
(* functionalize f is a stream containing * {f(), (f(); f()), (f(); f(); f()) ...} * f is only evaluated once per element generation even * if there are multiple copies of the same stream value *)valfunctionalize: (unit -> 'a option) -> 'a Stream.stream
For Example:
- val f = let val i=ref 0 in fn() => (i:= !i + 1; (SOME (!i ,()))) end; val f = fn : unit -> (int * unit) option - val s = Stream.make (f,()); val s = - : int Stream.stream - Stream.next s; val it = SOME (1,-) : (int * int Stream.stream) option - Stream.next s; val it = SOME (2,-) : (int * int Stream.stream) option - Stream.next s; val it = SOME (3,-) : (int * int Stream.stream) option - Stream.next (#2 (valOf it)); val it = SOME (4,-) : (int * int Stream.stream) option
But:
- val f' = let val i=ref 0 in fn() => (i:= !i + 1; (SOME (!i ,()))) end; val f' = fn : unit -> (int * unit) option - val s' = (Functionalize.functionalize f'); val s' = - : (int * unit) Stream.stream - Stream.next s'; val it = SOME ((1,()),-) : ((int * unit) * (int * unit) Stream.stream) option - Stream.next s'; val it = SOME ((1,()),-) : ((int * unit) * (int * unit) Stream.stream) option - Stream.next s'; val it = SOME ((1,()),-) : ((int * unit) * (int * unit) Stream.stream) option - Stream.next (#2 (valOf it)); val it = SOME ((2,()),-) : ((int * unit) * (int * unit) Stream.stream) option
To do: Read and understand the function functionalize in impstream.sml. You
will find it useful. Hint:
This is not a trick question. You don't need to hand anything in.
To familarize you with streams, you will implement some
sample streams defined in samplestreams.sig. Hint: use mapChunk
to filter 'a stream into 'b stream.
To do: Edit the samplestreams.sml file and implement the associated
signature in samplestreams.sig.
In a world that relies increasingly upon digital information, public key
encryption and digital signatures play an important part in achieving private
communication. RSA is a popular public-key encryption system. It is based on the
fact that there are fast algorithms for exponentiation and for testing prime
numbers—but no known fast algorithms for factoring large numbers. In this
problem set you will implement a version of the RSA system. The algorithms will
you implement have both a deep mathematical foundation and practical importance.
Cryptographic systems typically use keys for encryption and decryption. An
encryption key is used to convert the original message (the plaintext) to coded
form (the ciphertext). A corresponding decryption key is used to convert the
ciphertext back to the original plaintext.
In traditional cryptographic systems, the same key is used for both encryption
and decryption. Two parties can exchange coded messages only if they share a
secret key. Since anyone who learned that key would be able to decode the
messages, keys must be carefully guarded and transmitted only under tight
security: for example, couriers handcuffed to locked, tamper-resistant
briefcases!
Diffie and Hellman (1976) discovered a new approach to encryption and
decryption: public key cryptography. In this approach, the encryption and
decryption keys are different. Knowing the encryption key cannot help you find
the decryption key. Thus you can publish your encryption key on the web, and
anyone who wants to send you a secret message can use it to encode a message to
send to you. You do not have to worry about key security at all, for even if
everyone in the world knew your encryption key, no one could decrypt messages
sent to you without knowing your decryption key, which you keep private to
yourself.
The RSA system, due to Rivest, Shamir, and Adelman, is the best-known public-key
cryptosystem; the security of your web browsing probably depends on it.
RSA does not work with characters, but with integers. The standard ASCII
representation of a single character is a 7-bit integer. We will represent each
block of two characters as a 14-bit integer obtained by concatenating the ASCII
representations of the two characters. We have provided code to convert
character strings to lists of 14-bit integers and vice versa.
Suppose you want to obtain public and private keys in the RSA system. You select
two very large prime numbers p and q. (Recall that a prime number is a positive
integer greater than 1 with no divisors other than itself and 1). You then
compute numbers n and m:
n = pq
m = (p−1)(q−1)
It turns out that
- With very few exceptions, for almost all numbers e less than n, [em = 1] mod n
- no one knows how to compute m, p, or q efficiently, even knowing n.
(Notation: [a=b] mod n means that (a mod
n) = (b mod n), where a mod
n is the
remainder obtained when dividing a by n using ordinary integer division.
Equivalently, [a=b] mod n if a−b is divisible by n. For example, [17 = 32]
mod 5.)
Now you pick a number e < m relatively prime to m; that is, such that e and m
have no factors in common except 1. The significance of relative primality is
that e is relatively prime to m iff
e is invertible mod m;
that is, iff there exists a d such that [de = 1] mod
m.
Moreover, it is possible to compute d from e and m using Euclid's algorithm,
described below. Your public key, which you can advertise to the world, is the
pair (n,e). Your private key is (n,d).
Anyone who wants to send you a secret message s (represented by an integer)
encrypts it by computing E(s), where
E(s) = se mod
n
That is, if the plaintext is represented by the number s
which is less than n, the
ciphertext E(s) is obtained by raising s to the power e,
then taking the remainder modulo n.
The decryption process is exactly the same, except that d is used instead
of e:
D(s) = sd mod n
The operations E and D are inverses:
D(E(s)) = (se)d
mod n
= sde mod n
= s1+km mod n
= s(sm)k mod n
= s(1)k mod n
= s mod n
= s
The integer s representing the plaintext must be less than
n. That's why we
break the message up into chunks. Also, this only works if s is relatively prime
to n: it has no factors in common with n other than 1. If n is the product of
two large primes, then all but negligibly few messages s <
n satisfy this
property. If by some freak chance s and n turned out not to be relatively prime,
then the code would be broken; but the chances of this happening by accident are
insignificantly small. In summary, to use the RSA system:
1. pick large primes p and q;
2. compute n = pq and m =
(p−1)(q−1);
3. choose e relatively prime to
m and use this to compute d
such that [de = 1] mod m;
4. publish the pair (n,e) as your public key,
but keep d, p,
and q secret.
How secure is RSA? At present, the only known way to obtain d
from e and n is to
factor n into its prime factors p and
q, then compute m and proceed as above.
But no one knows how to factor large integers efficiently, despite centuries of
effort by number theorists. (More precisely, no one has admittedly publicly to
knowing how to do it!) Using known methods, factoring n = pq where p and
q are
1000-digit primes would require decades on today's fastest supercomputers. Until
someone comes up with an efficient way to factor numbers, or discovers some
other way to compute d from e and
n, the cryptosystem appears to be secure.
You have been given a partial implementation of the RSA algorithm in RSA-support.sml.
You must add the functions specified below.
(* * transformBlock(n, m, k) s is (s',l) where l is a byte list of length m * containg the first n bytes of s transformed by RSA with * key k. S' is the remainder of the list. *) fun transformBlock (n: int, m: int, k:key) (s: W.word S.stream): W.word S.stream * W.word list =
(* encryptBlock k s is (s', l') where s' is s with one * block of bytes removed and * 1) (s' is not empty) or (s' is empty and s contains blocksize elements), * in this case l' is RSA cipher text corresponding to the first * blocksize elements in s. or, * 2) s' is empty, 0 < elements is s < blocksize, in which case * l' is l''@[x] where l'' is RSA cipher text corresponding * to a list of blocksize containing the elements of s followed * by 0x00's. * 3) s' is empty and s is empty, in which case l' = [] * *) fun encryptBlock (k: key): W.word S.stream -> W.word S.stream * W.word list =
To do: Complete encryptBlock and transform in RSA-support.sml.
Public key encryption can be used to solve another problem of secure communication. Suppose you wish to send a message by electronic mail and sign it so that the recipient can be sure that it really came from you. What is required is some scheme for signing a message in a way that cannot be forged. This is called a digital signature.
Suppose you want to sign a digital message M
in a way that convinces
anyone else that the message came from you. This can be achieved by providing a
signature that could only have been sent by someone with your private key. This
signature is simply D(M).
Anyone who receives M and D(M) can verify that the signature
corresponding to the message by applying E
to it, obtaining E(D(M)) =
M. Only someone in possession of the private key d could have
constructed a signature with this property. If someone wanted to forge a
signature to a bogus message M, they would have to compute D(M),
but assuming that RSA is secure, this cannot be done without the private key d.
We can combine the two techniques as follows. Suppose Amy wants to send Bob a message that only Bob can read, and sign the message so that Bob knows it could only be from her. She encrypts the message using Bob public key. Then she signs the encrypted result using her own private key using the signature method just described. When Bob receives the message, he first uses Amy's public key to authenticate the signature, then decrypts the message using his own private key.
Bob can be sure that only someone with Amy's private key could have sent the message. Amy can be sure that only someone with Bob's private key can read the message. This is accomplished without exchanging any secret information between Bob and Amy. It's this capacity for achieving secure communication without having to worry about exchanging secret keys that makes public key cryptography such an important technique.
In real implementations of digital signatures, the whole message M is not signed; instead, the result of applying a secure one-way hash function is signed. An example of such a function is MD5.
To Do: Implement the functions encryptAndSign and unsignAndDecrypt in
rsa-support.sml.
A working implementation of LZ compression has been given to you, based on
the Tries module that you implemented in PS3. The compression
function is declared as follows:
val encode : BinIO.elem Stream.stream -> bool Stream.stream
The Lempel-Ziv compression algorithm is implemented as a functor that takes in a schema. The schema has four elements:
To do: Read and understand the function encode' and
encode in lz-fn.sml Hint: This is not a trick
question. You don't need to hand anything in.
As we have seen in lecture, arrays have the benefit of random access, meaning
looking up a particular element takes constant time. On the flip side,
arrays have the limiting property of being static in length. Java programmers
get around this by using the utility library's Vector class, which
allows new elements to be added.
You will implement a similar imperative data abstraction, specified in the file
resize_array.sig.
You will implement your resizable arrays on top of the existing SML type Array.array,
an array that is not resizable. The trick is to include in the
representation a reference to an Array.array that in general has
some extra unused elements ready to accommodate calls to insert.
When the underlying array runs out of extra elements, a new array is created
that is longer than the original array and all the information is copied from
one array to the other. You should grow the array in such a way
that the amortized insertion time is constant.
The 'a array type and empty function
have already been written for you. You will need to provide definitions for the
remaining functions. You should leverage the functions in the Array
structure as much as possible.
To do: Edit the file resizable-array.sml,
including your implementations for all unimplemented functions.
By now, you have seen enough data structures that you can decide for yourself exactly what data structure to use. Your decompression algorithm will rely on a structure with the following signature:
signature LZ_DICTIONARY = sig (* An "lzd" is a dictionary used to map codes to the strings they represent during the processing of a Lempel-Ziv-encoded stream. *) type lzd type code = int (* empty() creates and returns a dictionary *) val empty : unit -> lzd (* insert(d,c,s) inserts a mapping from c to s. * Returns true on successful insert, false if no changes were made *) val insert : lzd * code * Word8.word list -> bool (* lookup(d,c) is the string that d maps c to. * Checks: d has a mapping for c. *) val lookup : lzd * code -> Word8.word list (* size(d) is the number of mappings in d. *) val size : lzd -> int end
You must devise an asymptotically efficient data structure for this
problem and implement it in dictionary.sml. Each of the operations
of this data structure should take O(1)
amortized time; that is, any sequence of n
operations should take O(n)
time. You may find it to be the case that a data structure already exists, or
you may find it necessary to combine several data structures. As a strategy for
choosing your data structure, you should identify which functions above should
have the best performance, and optimize for them.
To do: Edit the dictionary.sml file,
providing a full implementation of the data structure you use for decompression.
A compressed file isn't very useful if you can't decompress it, so now you get to implement decompression. Since the decompression algorithm must stay in synch with the compression algorithm, both must observe the schema. As expected, decompression is the exact opposite of compression, and has the following type:
val decode : bool Stream.stream -> BinIO.elem Stream.stream
Now that you have an appropriate dictionary abstraction, it is time to write
the decode function that inverts the action of encode.
Pseudo-code for the decompression algorithm is given in the handout,
but as a word of caution, now is not the time to throw away your functional
programming skills. Your style will be assessed carefully in your code for decode.
Try to break it into separate pieces, much like the encode
function.
We are providing a set of suggested test cases. They are located in the
testcases.zip file. Compression and
decompression should work on all of the test cases. The FW
structure, located in frameworks.sml, has an instance of the LZ
structure. You should use this to test different values in the schema and try
different compression techniques.
HAND IN THE FOLLOWING: Edit the file lz-fn.sml,
implementing the decode function.
We are giving you three mystery files that may have been encrypted or compressed using one of the following compression schemas:
false)true)true)false)One of the files has been compressed, one has been encrypted, and one has
been compressed and then encrypted. The encrypted files can be decrypted with private key key.
All the files have been given the ".312" extension. Your job is to extract each of these files and determine what they are. They are all considered cross-platform, in that you should be able to open them using widely available programs. For each file, tell us what the correct file extension is and a brief (like one sentence) description of the contents. They could be image files, compressed text, or any other cross-platform file. Good luck!
To do: Edit the file exercise13.txt,
providing the information requested above.
Consider the following function definitions:
fun f(m,n) = if n = 1 then m else f(m*n, n-1) (* fact(n) is n! * Requires: n >= 1.*) fun fact(n) = f(1,n)
The function f has no specification. Give a specification
that is strong enough to let you prove that the function fact
is implemented correctly.
Using the specification you wrote in part 1, prove that fact
satisfies its specification. This should be easy.
Using the techniques given in class, prove by induction that f satisfies the specification you
wrote in part 1. This should be harder.
To do: turn in the solution to parts 1–3 in the file written.txt.