hashmap.sig and have your HashMap
implement this signature: HashMap(HB: HASH_BIND) :> HASHMAP. This problem set has 4 parts. In parts 1 and 2 you'll implement advanced data structures. In Part 3 you'll learn about the RSA cryptosystem and implement algorithms in this system. Part 4 is a written problem.
It is required that you work with one partner for this assignment. We ask you to use CVS for concurrent code development within your team (see the end of the assignment).
As before, you will write your solutions in the appropriate files and submit the files separately on CMS. To get you started, we are providing a zip file containing several files. You can edit the necessary files, and submit your solution using CMS when you have finished. Non-compiles and warnings will receive an automatic zero.
[Download the zip file ps4.zip]
Recall that an AVL tree is a binary search tree where all nodes have the property that the height difference between their children (of balance factor) is -1, 0, or 1. In class you have seen an implementation of AVL trees that keeps information about height in each node. It is, in fact, possible to have an implementation where each node contains the balancing factor instead. Since balancing factors can be represented using only two bits, this approach can be more space-efficient.
In this part of the assignment you will implement a simple map abstraction using AVL
trees with balance factors. The AVL tree map is parameterized on a signature
ORD_BIND containing types key and value,
plus a function that compares two keys. The datatype definition of tree maps is
as follows:
datatype bal = LEFT | BAL | RIGHT datatype bind = key * value datatype avl = Nil | Node of bind * bal * avl * avl type map = avl
Each node of the tree contains a binding of an integer key to a value, plus
the balance factor of the current node. The balance factor indicates whether the
current node is skewed to the right (RIGHT), to the left (LEFT),
or if it is balanced (BAL).
repOk function that checks whether a tree is a valid AVL
tree map. For full score, your implementation must be O(n) in the number of
nodes of the tree.put that inserts a new
binding in the tree. The code places the new binding as
a leaf, and then walks the tree up, re-balancing nodes along the way. In the absence of explicit height information, the rebalancing process becomes
more difficult: one must keep track of whether insertion has caused sub-trees to
increase in height. The helper function put'(t,e) inserts an element e into an AVL
tree t, and returns
a pair consisting of the resulting AVL tree t' and a flag that indicates whether t'
is taller than t. Please look at the code and understand how the recursive
insertion process works.
balance(s,n) rebalances node n after insertion. All of the subtrees of
n are AVL trees, but the current node n contains the old balance factor. The
first parameter s indicates the side where the new binding has been inserted.You task is to fill in the code for
balance. This requires a detailed case analysis. For each case you must
determine if rotations are needed, how many rotations need, what are the
resulting balance factors, and whether the resulting tree increases in height as
a result of the insertion. The use of pattern matching can greatly simplify your
code (to 1-2 lines per case). Your implementation of balance must
take constant time.
map.sig, that is functions get, fold, and
size. The
fold function will traverse the nodes of the tree using in-order traversal.To do: turn in the solution in the file avlmap.sml.
signature HASH_BIND = sig type key type value val hash : key * int -> int val equals : key * key -> bool end
The hash function hash(k,n) returns a hash code between 0 and
n-1. Any implementation of the above signature must be such that if
equals(k,k')=true then hash(k,n) = hash(k',n) for any value
of n. This condition is required for correctness, but is not checked by the
program.
The hash table will be implemented in the standard manner, as an array of buckets where each bucket contians a list of key-value elements. In addition, your implementation must support dynamic rehashing. When a new hash map is created, one must specify the initial number of buckets and the maximum load factor of the hash table. When the load factor reaches the maximum value, the number of buckets must be doubled, and the table must be rehashed. Your task is to do the following:
new(n,f) creates a hash table with n
buckets and with maximum load factor f. Function put(m,k,v)
inserts value v for key k into the map m; if m already contains a binding for k,
the old value is replaced. Function get(m,k) returns an
option that contains the binding of k in m. Function delete(t,k) removes
the binding of k in m.val new : int * real -> map val put : map * key * value -> unit val get : map * key -> value option val delete : map * key -> unit
size returns the
number of data elements in the hash table. Function buckets returns
the number of buckets in the hash table. Function fold performs the
fold operation over all the values in the table, starting from bucket zero,
and traversing each bucket from left to right.val size : map -> int val buckets : map -> int val fold : (key * value * 'a -> 'a) -> 'a -> map -> 'a
To do: submit your solution in the file hashmap.sml.
In a world that relies increasingly upon digital information, public key
encryption and digital signatures play an important part in achieving private
communication. RSA is a popular public-key encryption system. It is based on the
fact that there are fast algorithms for exponentiation and for testing prime
numbersbut no known fast algorithms for factoring large numbers. In this
problem set you will implement a version of the RSA system. The algorithms will
you implement have both a deep mathematical foundation and practical importance.
Cryptographic systems typically use keys for encryption and decryption. An
encryption key is used to convert the original message (the plaintext) to coded
form (the ciphertext). A corresponding decryption key is used to convert the
ciphertext back to the original plaintext. In traditional cryptographic systems,
the same key is used for both encryption and decryption. Two parties can
exchange coded messages only if they share a secret key. Since anyone who
learned that key would be able to decode the messages, keys must be carefully
guarded and transmitted only under tight security: for example, couriers
handcuffed to locked, tamper-resistant briefcases!
Diffie and Hellman (1976) discovered a new approach to encryption and
decryption: public key cryptography. In this approach, the encryption and
decryption keys are different. Knowing the encryption key cannot help you find
the decryption key. Thus you can publish your encryption key on the web, and
anyone who wants to send you a secret message can use it to encode a message to
send to you. You do not have to worry about key security at all, for even if
everyone in the world knew your encryption key, no one could decrypt messages
sent to you without knowing your decryption key, which you keep private to
yourself.
The RSA system, due to Rivest, Shamir, and Adelman, is the best-known public-key
cryptosystem; the security of your web browsing probably depends on it.
RSA does not work with characters, but with integers. That is, it encrypts numbers. To encrypt a piece of text, just combine the ASCII codes of several characters into a big number, and then encrypt the resulting number. A big number is an integer of arbitrary magnitude (i.e., is not restricted to 32 bits or 64 bits).
Suppose you want to obtain public and private keys in the RSA system. You select two very large prime numbers p and q. (Recall that a prime number is a positive integer greater than 1 with no divisors other than itself and 1). You then compute numbers n and m:
n = pq
m = (p−1)(q−1)
It turns out that no one knows how to compute m, p, or q efficiently, knowing n.
(Notation: [a=b] mod n means
that (a mod n) = (b mod n),
where a mod n is the remainder obtained
when dividing a by n
using ordinary integer division. Equivalently, [a=b]
mod n if a−b is divisible
by n. For example, [17 = 32]
mod 5.)
Now you pick a number e < m relatively
prime to m; that is, such that
e and m have
no factors in common except 1. The significance of
relative primality is that e is relatively
prime to m iff e
is invertible mod m; that is, iff there exists
a d such that [de =
1] mod m. Moreover, it is possible to compute
d from e and
m using Euclid's algorithm, described below. Then:
Anyone who wants to send you a secret message s
(represented by an integer) encrypts it by computing a number
E(s).
The cyphertext E(s) is obtained by raising
s to the power e,
then taking the remainder modulo n :
E(s) = se mod n
The decryption process is exactly the same, except that
d is used instead of e:
D(s) = sd mod
n
The operations E and
D
are inverses:
D(E(s)) = (se)d
mod n
=
sde mod n
=
s1+km mod n
=
s(sm)k mod n
=
s(1)k mod n
=
s mod n
=
s
The integer s representing the plaintext must
be less than n. That's why we break the message
up into chunks. In the above reasoning, the equality [sm=1]
mod n holds only if s is
relatively prime to n (using a theorem due to
Euler and Fermat). Otherwise, if s divides
n, that is, s is either
p or q, then
D(E(s))
= sde mod n = s, trivially.
Therefore, it is always the case that D(E(s))
= s.
In summary, to use the RSA system:
1. pick large primes
p and q;
2. compute
n = pq and
m = (p−1)(q−1);
3. choose
e relatively prime to m
and use this to compute d such that
[de = 1] mod m;
4. publish the pair
(n,e) as your
public key, but keep d, p,
and q secret.
How secure is RSA? At present, the only known way to obtain
d from e and
n is to factor n
into its prime factors p and
q, then compute m and proceed as above. But despite centuries of effort by number theorists,
factoring large integers efficiently is still an open problem. Until someone
comes up with an efficient way to factor numbers, or discovers some other way to
compute d from e
and n, the cryptosystem appears to be
secure for large numbers n. Although several
"better" factorization algorithms have been proposed, the problem remains
computationally intractable for numbers that are large enough. For example, it
is infeasible with today's best supercomputers and best algorithms known to date
to factor an arbitrary 1024-bit number. In fact, the
RSA Factoring
Challenge offers money prizes for factoring large numbers. To date, the
largest number from this challenge that has been successfully factored had 640
bits (or 193 decimal digits). It took about 5 months for several tens of
machines working in parallel to factor this number.
In the file rsa.sml you have been given supporting code for
implementing the RSA algorithm. The code relies on the implementation of
big integers and random numbers from the
SML/NJ library.
Note that this library is not loaded by default! Using the SML Compilation
Manager, the library is loaded by the line smlnj-lib.cm in the
sources.cm file (check the comments in that file if you have
trouble loading the library).
The given code provides a variety of functions, including functions to generate prime numbers, and to test numbers for primality; code that generates key pairs; to read and write keys to and from files; or to convert between lists of bytes and big numbers. We briefly discuss a few of these functions.
expmod(b, e, m)
computes be mod m. Function
euclid(m,n) yields numbers (s,t,g)
such that sm + tn = g, where
g is the greatest common divisor of m
and n. isPrime(n) tests is number is prime. This is a probabilistic test:
it yields true if there is a high probability that n
is prime; and yields false if n is definitely
not prime. This function is based on Fermat's theorem, which states that if
n is prime, then all numbers a in the range 0 <
a < n satisfy [an-1
= 1] mod n; and if n
is not prime, then at most half do. Therefore, successfully checking if [an-1
= 1] mod n for k numbers
a indicates that n is prime with a probability of error of 1 / 2k.generateKeyPair(r) generates an RSA key pair. It
picks random primes p and
q that are in the range from 2r to 2*2r so
that n = pq
will be in the range 22r to 22r+2. To find the keys
e and d given
m = (p-1)(q-1),
function selectKeys picks a random e,
and computes d using Euclid's algorithm.You must add the encryption and decryption functions specified below. Your functions encode an input list of bytes into an output list of bytes. Therefore, these functions operate on blocks of bytes. Each block consists of M bytes or (M-1) bytes, where M is the number of bytes required to store the modulus n of the key (n,e) or (n,d).
To make the encrypted message more difficult
to break, we'll add a small tweak to the encryption scheme: before encoding
a block, the value of the previously encoded block is added to the value of
the current block. For instance, if
we denote by {X} the RSA encoding of one single input block X, then a
sequence of 3 blocks:
A, B, C
is encoded as:
A' = {A}, B'={ (A'+B) mod n }, C' = { (B'+C) mod n }
You must implement the function that
encrypts an input message represented as a list of
bytes:
(* encrypt k l encrypts byte list l with key k. * If M is the number of bytes in k's modulus, then break l into * (M-1)-byte blocks, and encrypt each block into an M-byte block. * Requires: length(l) must be a multiple of M-1. * Returns: the encrypted list of bytes *) val encrypt : key -> word list -> word list
Decryption performs the opposite operation: it decodes each M-byte block into an M-1 byte block. Remember that each block in the cyphertext is the RSA-transform of the current block and the encrypted previous block. Decrypting a block must yield exactly M-1 bytes.
(* decrypt k l decrypts byte list l with key k. * If M is the number of bytes of k's modulus, then break l into * M-byte blocks, and decrypt each of them into an (M-1)-byte block. * Returns: the list of bytes from the decrypted blocks *) val decrypt : key -> word list -> word list
Note: You may find it useful to use a single function that performs the RSA-transform, and use this function for both encryption and decryption:
(* transform k n l encodes a list of bytes l into an * n-byte output list l', using key k. *) val transform : key -> int -> word list -> word list
To do: Complete encrypt and decrypt in
rsa.sml.
Public key encryption can be used to solve another problem of secure communication. Suppose you wish to send a message by electronic mail and sign it so that the recipient can be sure that it really came from you. What is required is some scheme for signing a message in a way that cannot be forged. This is called a digital signature.
Suppose you want to sign a digital message M in a way that convinces anyone else that the message came from you. This can be achieved by providing a signature that could only have been sent by someone with your private key. This signature is simply D(M). Anyone who receives M and D(M) can verify that the signature corresponding to the message by applying E to it, obtaining E(D(M)) = M. Only someone in possession of the private key d could have constructed a signature with this property. If someone wanted to forge a signature to a bogus message M, they would have to compute D(M), but assuming that RSA is secure, this cannot be done without the private key d.
We can combine the two techniques as follows. Suppose Amy wants to send Bob a message that only Bob can read, and sign the message so that Bob knows it could only be from her. She encrypts the message using Bob's public key. Then she signs the encrypted result using her own private key using the signature method just described. When Bob receives the message, he first uses Amy's public key to authenticate the signature, then decrypts the message using his own private key. Bob can be sure that only someone with Amy's private key could have sent the message. Amy can be sure that only someone with Bob's private key can read the message. This is accomplished without exchanging any secret information between Bob and Amy. It's this capacity for achieving secure communication without having to worry about exchanging secret keys that makes public key cryptography such an appealing technique.
Note: For efficiency and other reasons, real implementations of digital signatures do not sign the message itself, but just sign a small hashcode of the message. That is, for a message M, the digital signature is not D(M), but D(H(M)), where H is a secure one-way hash function (an example hash function is MD5). The pair (M, D(H(M))) is the signed document.
To Do: Implement the functions encrypt_and_sign and
unsign_and_decrypt in rsa.sml.
To do: turn in the solution in the file complexity.txt.
cvslog showing your cvs
activity during the assignment. For this, look into the "cvs log"
command.
This will count for 5% of the grade for this assignment.