Preliminary Examination 1

March 7, 2002

**a)** Briefly explain the difference between an SML (also called **algebraic**)
datatype and an *abstract* datatype.

*Algebraic datatypes are one of several kinds of types in SML. (Others
include record, tuple, and function types). Although there are several standard
datatypes already defined by SML (for example, list), the user can write new
datatype declarations; here is one of several examples we have seen so far in
the course:*

datatype color = Red | Black datatype rbtree = Leaf | Branch of value*color*rbtree*rbtree

*A datatype declaration concretely specifies the representation of a type
as a one of a set of constructors; these constructors and the implicit *destructors*
that are invoked when pattern matching on a datatype are the operations of the
algebraic datatype.*

*Abstract data types (ADTs) are a programming methodology rather than a
language feature, though this methodology is supported by SML's ability to
declare abstract types in module signatures. From the client view, an ADT is an
abstract type whose representation is hidden, plus some operations whose
implementation is also hidden. An implementer has the flexibility to change the
representation and implementation of these types because the user does not
depend on them.*

[*This answer is longer and more complete than we expected yours to be!*__]__

**b)** Why is unit testing needed even if integration testing succeeds?
Explain briefly.

*Integration testing tests an entire program, but not the individual
modules of the program. The modules may contain bugs that are difficult or
impossible to reproduce by integration testing. Unit testing gives more
assurance that the program works correctly. Even if integration testing
completely tests program functionality, bugs that would be caught by unit
testing may otherwise show up as program code evolves to use the module
differently.*

**c)** Explain briefly how hash tables implement the set membership
operation in O(1) time.

*A hash table containing n elements hashes an element to be tested for
membership into one of O(n) buckets using O(1) time. Assuming that the hash
function has a uniform distribution over the buckets, each bucket contains O(1)
elements on average, so the lookup within that bucket is constant time.
Therefore the expected time to perform the whole operation is constant, i.e.
O(1).*

A **value** is an expression that does not need further evaluation; if
typed to the SML prompt, exactly the same thing is reported as a result. For
example, it may be a integer constant, a `fn`

expression, a tuple of
values, or a datatype constructor applied to a value. Complete each of the
following **let**-expressions by giving a *value *that can replace
"`???`

" and cause the whole thing to evaluate to the
integer 42.

**a)**

let fun zardoz(x: string*string): int = case Int.fromString((#2 x)^(#1 x)) of NONE => 41 | SOME n => n*2 in zardoz("1",???) end

`"2"` |

**b)**

let fun zardoz (x:string, (y:(int*int) option, z: int)):int = case y of SOME (7,x) => x*z-1 | SOME (x,z) => x*z | NONE => size(x)*z+1 in zardoz(???) end

`("CS", (SOME (6,7), 312)) ` |

**c)**

let datatype sa = Done of int | Function of sa->int fun zardoz(f:sa):int = case f of Done(n) => n | Function(g) => g(f) in zardoz(Function(???)) end

`fn (h: sa) => 42` |

DataMiners Inc is designing an application to track stock portfolios and
needs code to analyze a history of stock prices and find the maximum profit that
could have been obtained by buying and selling at just the right moments. The
stock price history is stored as a list of ints ordered in time; the maximum
profit is the largest value *y*`-`*x
*where *x*, *y*
are integers in the list and *y* occurs in
the list *after* *x*. For example, for
the list `[4,6,7,2]`

the maximum profit would be 7`-`4=3.
Note that it would not be 7`-`2=5 because 7 does not occur after 2 in
the list. The maximum profit can even be negative if the stock price always
declines in value throughout the history.

The programmers at DataMiners have written the following analysis code; they
have hired you to figure out whether it works and analyze its performance.

```
fun maxDiff_x(data: int list, x: int, currentMax: int) =
foldl (fn(y,max) => Int.max(y-x,max)) currentMax data
```

fun mpLoop (data: int list, currentMax:int) : int = case data of [] => currentMax | x::xs => mpLoop(xs, maxDiff_x (xs, x, currentMax)) fun maxProfit (data: int list) : int option = case data of x::y::xs => SOME (mpLoop (data, y-x)) | _ => NONE

**a)** Give a specification of the function `maxDiff_x`

.

*maxDiff_x(data, x, currentMax) is the larger of either currentMax or the
largest difference between some element of data and x. Note that data is allowed
to be empty.*

**b)** Give a specification of the function `mpLoop`

.

*mpLoop(data, currentMax) is the larger of either currentMax or the largest
difference y*`-`*x between two elements x and y in data, where y
strictly follows x in the list. Note that data may be empty.*

**c)** Give a tight asymptotic bound on the running time of `maxDiff_x`

(no justification needed).

Q(*n*)* where n is the length of data. This
is because maxDiff_x simply calls foldl with an O(1) anonymous function.*

**d)** Derive a recurrence relation for the worst-case running time of `mpLoop`

as a function of the length of the list `data`

.

*The time to call mpLoop on a list of size n
includes:*

*Some constant amount of computation (call it k*_{1}*)**1 call to maxDiff_x, taking time k*_{2}*n**for some k*_{2}- A recursive call taking time
*T*(*n*)`-`1

*Therefore the recurrence is:*

T(0) =k_{1}

T(n) =k_{1}+k_{2}n+T(n1)-

**e)** Use your recurrence relation to determine the worst-case time bound
for this code. Justify this bound by using the recurrence relation.

*Two possible solutions. The first is to expand the terms of the recurrence.
(We could replace k*_{1}*
and k*_{2}*
with 1 here and get the same answer, but just to show we don't have to, here's
how it works out:*

T(n) =k_{1}+k_{2}n+T(n1)-

T(n) =k_{1}+k_{2}n+k_{1}+k_{2}(n1) +-T(n2)-

T(n) =k_{1}+k_{2}n+k_{1}+k_{2}(n1) +-k_{1}+k_{2}n+k_{1}+k_{2}(n2) +-T(n3)-

...

T(n) = (n+1)k_{1}+k_{2}· S_{i = 1 to n}i

T(n) = (n+1)k_{1}+k_{2}(½n(n+1)) = ½k_{2}n^{2}+n(½k_{2}+k_{1}) +k_{1}which is clearly Q(

n^{2}) because for sufficiently largenit is bounded above by, say,k_{2}n^{2}, and below by, say, ¼k_{2}n^{2}.

*The other solution is a proof by induction. Here we replace k*_{2}*
with *1* and discard the O(1) k*_{1}*
term.*

** P(n)** :

£

£

=

The following is a simplified version of the SML signature for strings, with only some operations specified fully:

signature STRING = sig (* A string is a finite sequence of characters. For example, the * string "abd" is the characters #"a", #"b", #"d" in sequence. *) type string (* size(s) is the number of characters in the string. *) val size : string -> int (* sub(s, i) is the character in the string at index i, * where indices start at zero in the usual manner. *) val sub : (string * int) -> char (* ^ is ordinary string concatenation *) val ^ : (string * string) -> string (* implode(lst) is the string containing the characters in lst * in the same order as in lst, e.g. implode[#"a",#"b"] is "ab" * and implode nil is the empty string "". *) val implode : char list -> string (* explode is the inverse of implode *) val explode : string -> char list ... end

**a) (2 pts) **Explain briefly what is wrong with the specification for
the function `sub`

and give a corrected specification.

*It doesn't specify what happens when the index i is out of bounds. We need to
add a requires (or checks) clause:
Requires: 0 <= i < size(s)*

While the default SML implementation of this signature is adequate for a broad range of applications, there are cases where a different implementation is better. For example, if the program is doing a lot of string concatenations (^), the standard

`String`

implementation may not be fast enough because
it does string concatenation in time O(structure Rope :> STRING = struct datatype string = Empty | Single of char | Concat of string * string * int (* Abstraction function: * Empty represents the empty string * Single(c) represents the single-character string containing c * Concat(s1,s2,n) represents the concatenation of * s1 with s2, i.e., s1^s2. * Representation invariant: ??? *) fun size(s) = case s of Empty => 0 | Single(c) => 1 | Concat(s1,s2,n) => n fun op^(s1, s2) = Concat(s1, s2, size(s1) + size(s2)) fun sub(s, i:int) = case s of Empty => raise Subscript | Single(c) => if i = 0 then c else raise Subscript | Concat(s1,s2,n) => let val n1 = size(s1) in if i < n1 then sub(s1,i) else sub(s2, ???) end fun implode(cl) = let fun implode_n(cl: char list, n: int): string = case cl of nil => Empty | c::nil => Single(c) | c::cl' => Concat(Single(c), implode_n(cl', n-1), n) in implode_n(cl, List.length(cl)) end fun explode(s) = ??? end

In the rest of this problem, the type name `string`

refers to the
type `Rope.string`

.

**b)** **(4 pts)** The representation invariant is conspicuously
missing from the `Rope`

implementation. Give a representation
invariant. Hint: consider what rep invariant is needed in order to make the size
function correct.

*For the representation *`Concat(s1,s2,n)`

*, n is equal to
the sum of the sizes of s1 and s2. It is also the length of the string. We also
might include ``no Empty nodes in non-empty strings'' in the
representation invariant — we do not need it for correctness, but without it
we won't be able to prove any running time bounds. But then we'd have to make
the code for ^ a slightly more complicated.*

`string`

values `s1`

and `s2`

of length at most `s1^s2`

as a
function of `O(1)`

time, because `size`

itself is `O(1)`

.

`int`

. What expression should replace the `???`

to make `sub`

work correctly?
*We know that the size of s1 is n1. Whichever char
of s2 has index i in s1^s2 must have
index i-n1 is s2. Therefore we need ??? = i-n1*

`sub`

. Describe two
different representations of the same string where one representation is much
better than the other.
`sub`

has worst-case `O(height)`

run time. The tree
representing a string of `n`

characters can have height up to `n-1`

if the left-hand children are always `Single`

s. It can also have
height `lg(n)`

if it is balanced.

`string`

of length `sub`

as a function
of `O(n)`

by the argument above.

`explode`

(your
implementation need not be asymptotically efficient).
*Here are two implementations. The second one is asymptotically efficient
but more complicated.*

fun explode(s: string) = case s of Empty => [] | Single(c) => [c] | Concat(s1,s2,n) => explode(s1) @ explode(s2)

fun explode(s) = let fun explodeApp(s, lst) = case s of Empty => lst | Single(c) => c::lst | Concat(s1,s2,n) => explodeApp(s1,explodeApp(s2, lst)) in explodeApp(s, []) end

**h)** **(EXTRA CREDIT: 2 pts -- save till you are done with everything
else!) **Give a better implementation of `implode`

that improves
the worst-case asymptotic efficiency of `sub`

on strings that it
creates.

fun implode(cl): string = let fun implode_n(cl: char list, n: int): string = case n of 0 => Empty | 1 => Single(hd cl) | n => let val m = n div 2 in Concat(implode_n(cl, m), implode_n(List.drop(cl, m), n-m), n) end in implode_n(cl, List.length(cl)) end

Recall the definition of `foldl`

:

fun foldl (f: ('a*'b) -> 'b) (base:'b) (x:'a list):'b = case x of nil => base | hd::tl => foldl f (f (hd,base)) tl

For each of the following tasks, show how `foldl`

can be used to
reimplement the function in 1–2 lines of *functional* code, without using
`case`

, `hd`

, `tl`

, or `nth`

. Your
version should always produce the same result as the reference code. In your
code, include the full type signature for the function provided as the argument `f`

and remember to check your code to make sure that that function returns the type
it needs to.

**a)** **(5 pts)** Given a list of integers, find the sum of all
elements prior to the first 0. Reference code to be rewritten:

fun SumPriorZero(a: int list) = case a of 0::t => 0 | h::t => h + SumPriorZero(t) | nil => 0Answer:fun SumPriorZero(a: int list) = #1 (foldl (fn (e, (x, b)) => if b orelse e=0 then (x,true) else (e+x,false)) (0, false) a)

**b) (6 pts)** Return a list containing every *n*^{th}
element in a list. Reference code:

fun everyNth(l:'a list, n:int):'a list = let fun iter(l:'a list, counter:int):'a list = case (l,counter) of ([],_) => [] |(x::xs,1) => x::(iter(xs,n)) |(x::xs,c) => iter(xs,c-1) in iter(l,n) endAnswer:fun everyNth(l: 'a list, n:int):'a list = List.rev (#2(foldl (fn (elt,(c,lst))=>if (c=1) then (n,elt::lst) else (c-1,lst)) (n,[]) l))