Problem Set 3: Substitution Model and Datastructures

Due on October 13, 2004 at 12:01 am.

Instructions

This assignment is longer, and more challenging than the first two. Start early and make sure you thoroughly understand the problem before you start to code your solution. Try to write concise, easy-to-read solutions - if you find yourself writing a lot of code you might not be pursuing the best solution. Think before you start coding! Save work where you can by using library functions.

You have the option to - and we recommend that you do - select a partner for this assignment. Only groups of at most two persons are permitted. You can register your group through CMS. Choose your partner carefully; it is not an acceptable excuse not to submit a finished homework because "the other partner did not work enough."

You can only modify the provided .sml files. Complete all function definitions with type specifications for both function arguments and function return types. Do not modify the provided signature files or file sources.cm in any way. Submit your written problem and each of your two .sml files separately through CMS.

Use the compilation manager to test your programs. Not only will compilation be simpler, but the error messages that you get will be more meaningful. To compile your code with the compilation manager switch to the directory that contains your files (including sources.cm), and type CM.make().

Non-compiling code will receive an automatic zero.

Problem 1: Higher-Order Functions

Consider the following definition in SML:

fun A f x = f x x

Also, recall the function composition operator (f o g) x = f(g(x)).

Infer the type of A - clearly show and explain every step of the type-inference process.
Consider the expression (A o A) a0 a1 ... an, where a0, a1, a2, ..., an are suitably chosen arguments. Determine a natural number n >= 0 and the corresponding arguments a0, a1, a2, ..., an so that expression (A o A) a0 a1 ... an reduces to a non-function value. Describe all arguments ai in full, including their types.
Show how the following expression is evaluated:
```
(A o A o A) (fn a => fn b => fn c => fn d => a + b + c + d) 78
```
Do not use the substitution model directly, as it would be too long and tedious to write out all the steps. Rather, rewrite the expression to reflect the action of the various functions. Obtaining the correct final result without showing the steps leading to it will not earn you points on this problem.
Assuming that we denote A o A by A2, and A o A o A by A3, provide definitions analogous to the definition of A that show the effect of A2 and A3.
Based on your previous answers, provide a definition for function Ak, where Ak is obtained by composing A with itself k > 0 times. Prove your statement using induction. Again, you should not use the substitution model directly in your proof, just rewrite the expression to show the effect of the various functions.

What to Submit

Write up your answers are carefully and as clearly as you can. Longer answers are not necessarily better than clear, short ones. Submit your writeup as file written.txt through CMS.

Problem 2: The Substitution Model at Work

We have introduced the substitution model in class in order to provide an execution model for a carefully chosen subset of SML. We have also examined how certain features of the modeled language (e.g. eager evaluation) depend on the rules of the substitution model, and how these could be changed by suitably modifying the model's rules.

In this problem we will build an SML interpreter that implements the substitution model almost exactly as we specified it in class.

The Target Language

We reproduce below the language definition given in class:

e ::= c                        (* constants *)
    | id                       (* variables *)
    | (fn id => e)             (* anonymous functions *) 
    | e1(e2)                   (* function applications *)
    | u e                      (* unary operations, ~, not, etc. *)
    | e1 b e2                  (* binary operations, +,*,etc. *)
    | (if e then e1 else e2)   (* if expressions *)
    | let d in e end           (* let expressions *)
    | (fun f x = e)            (* fun expressions *)

Declarations:

d ::= val id = e               (* value declarations *)
    | fun id(id1) = e          (* function declarations *)

Note that this is not a proper subset of SML, since SML does not have fun expressions. Also, note that our language has neither case statements, nor pattern matching. All our functions have exactly one argument, and all let statements contain exactly one declaration.

To simplify the implementation, we will assume that the language has only two basic types, integers and booleans. No other types can be defined, also, no type specifications are accepted anywhere in the program.

To avoid confusion with full-fledged SML, we name the language that we implement subst-SML.

The Abstract Syntax Tree

One of the basic problems we need to address is to find a suitable program representation that can serve as input for our interpreter. One possibility is to use plain-text representation. While most convenient for the user, such a representation is not very suitable for direct evaluation. The conversion to a more convenient representation could be implemented in SML, and it raises many interesting questions (which are addressed in other CS courses). These representation conversion issues are ancillary to our goal in this problem, thus, we will represent our programs as if the conversion step has already taken place. The representation we choose is named the abstract syntax tree and it is based on the following type definitions:

datatype decl = VAL_D of string * expr
              | FUN_D of string * string * expr

and expr = INT of int
         | BOOL of bool
         | FN of string * expr
         | ID_E of string
         | FAPP_E of expr * expr
         | FUN_E of string * string * expr (* name * arg * body *)
         | LET_E of decl * expr
         | IF_E of expr * expr * expr
         | BINOP_E of expr * string * expr
         | UNOP_E of string * expr

First, note the and keyword that links the two datatype definitions. This keyword makes it explicit that these two type definitions are mutually recursive (i.e. type decl refers to type expr in its definition, and vice-versa). Such simultaneous declarations are necessary for mutually recursive functions as well.

We use "_D" to mark a datatype constructor for declarations and "_E" to mark a datatype constructor for expressions. Note that INT, BOOL, and FN do not have any special marking. This is to emphasize their special status as both expressions and values. We could have defined "expression" constants and "value" constants, for example, and you should know that this is often done. Because of the peculiar features of subst-SML, however, it turns out that our representation is more convenient. Make sure you do not get confused by the fact that our "values" really are, according to their type, expressions. Only integers, booleans, and anonymous functions can be values (i.e. results of an evaluation); other expressions always result in recursive calls.

The abstract syntax tree represents the essential structure of the program, which is stripped by all unnecessary details. Here are a number of val definitions in SML that encode subst-SML programs:

val e_3 = INT 3

val e_let = LET_E(VAL_D("x", INT 3),
                  BINOP_E(ID_E "x", "+", INT 7))

val e_fn = FAPP_E(FN("x", BINOP_E(ID_E "x", "+", INT 25)), INT 7)

val e_fun = FUN_E("function", "x", ID_E "x")

val e_fact = LET_E(FUN_D("fact",
			  "n",
			  IF_E(BINOP_E(ID_E "n", "=", INT 0),
			       INT 1,
			       BINOP_E(ID_E "n",
				       "*",
				       FAPP_E(ID_E "fact",
					      BINOP_E(ID_E "n",
						      "-",
						      INT 1))))),
		     FAPP_E(ID_E "fact",
			    INT 3))

In the file subst-SML.sml we have provided functions printDecl and printExpr that print a declaration and an expression, respectively. Feel free to use them to print out abstract syntax trees while you are debugging. The functions are not optimized to produce pretty-printed output, rather, they focus on unambiguosly representing the program's structure.

- printExpr e_fact;
val it = "let fun fact(n) = if ((n)=(0))
          then 1
          else ((n)*((fact)(((n)-(1))))) in (fact)(3) end": string

Note: we formatted the output above to better fit it onto the page.

Implementing Binary and Unary Operators

Our substitution model left unspecified the behavior of the unary and binary operators. Any practical implementation must provide implementations for at least some of these operators, however. We have provided the following partial implementations for you:

...
and eval_binop(e1, s, e2) = 
  case s of
    "+" => INT (getInt(e1) + getInt(e2))
  | _ => raise Fail (s ^ "not implemented")

and eval_unop(s, e) =
  case s of
    "~" => INT (~(getInt(e)))
  | "kill" => raise Fail ("code " ^ (Int.toString (getInt e)))
  | _ => raise Fail (s ^ "not implemented")
...

In this part, you will extend the definition of eval_binop and eval_unop by implementing the - and * binary operators, as well as the not, orelse, and andalso operators. Implement the usual SML semantics for these operators. Also, implement a unary operator named print which takes one integer or a boolean and prints it, followed by a newline character "\n". The print operator returns the integer 0, no matter what its argument is. Note that the functions that implement unary and binary operators receive unevaluated expressions as arguments, which means that each individual operator can control when - and whether - it evaluates its arguments.

Implement the equality operator = for all types for which is it is meaningful. Also, implement the < operator for integers.

Substitutions

A fundamental operation in the substitution model is that of the substitution. We isolated this operation into a function whose partial definition we have provided below:

fun substitute(id, what, target) = 
let
  fun helper(target): expr =
  case target of
    _ => raise Fail "not implemented"
in
  helper target
end

We defined function helper so that we don't have to pass id and what as arguments for each recursive call to helper. If this function were complete, then it would replace all the free occurences of identifier id in expression target with expression what. Here are a few examples:

- substitute("x", BINOP_E(ID_E "a", "+", ID_E "b"), ID_E "x");
val it = BINOP_E (ID_E "a","+",ID_E "b") : expr

- substitute("x", BINOP_E(ID_E "a", "+", ID_E "b"),
=                 LET_E(VAL_D("x", BINOP_E(ID_E "x", "+", INT 3)),
=                       ID_E "x"));
val it =
  LET_E
    (VAL_D ("x",BINOP_E (BINOP_E (ID_E "a","+",ID_E "b"),"+",INT 3)),ID_E "x")
  : expr

You might find that you can decode the result of a substitution easier if you use function printExpr on the output of functin substitution.

The second example above corresponds to the following subst-SML code:

let
  val x = x + 3
in
  x
end

The example shows that only free occurences of x in let are substituted. Make sure that your solution exhibits the same behavior.

For this part, complete the definition of function substitute.

Evaluating Expressions and Declarations

We now turn our attention to implementing the evaluator. First, we must distinguish between evaluating expressions and declarations. There are only two declaration types, and only two rules that apply to them. These rules are implemented by function evalDecl, whose full definition we provide:

...
and eval_decl d =
  case d of
    VAL_D(s, e) => (s, eval e)
  | FUN_D(s1, s2, e) => (s1, FUN_E(s1, s2, e)) 
...

The result of evaluating a declaration - according to rules D1 and D2 from lecture - is a substitution. We represent substitutions as the pair of an identifier (given as a string) and an expression that will be substituted for the respective identifier.

We have provided a partial implementation of function eval which evaluates an expression.

...
and eval expr =
  case expr of
    (INT _ | BOOL _ | FN _ ) => expr
  | ID_E id => raise Fail ("free occurence of " ^ id ^ "; can't evaluate")
  | _ => raise Fail "not implemented"

The existing code shows that integer and boolean constants, as well as anonymous function expressions evaluate to themselves. Assuming that we have started with a legal subst-SML expression (one that has no free variables), we will never attempt to evaluate an identifier. The reason for this is somewhat subtle - do you understand why this is true?

Here are a few examples of evaluations using the expressions defined above:

- eval(e_3);
val it = INT 3 : expr
- eval e_3;
val it = INT 3 : expr
- eval e_let;
val it = INT 10 : expr
- eval e_fn;
val it = INT 32 : expr
- eval e_fun;
val it = FN ("x",ID_E "x") : expr
- eval e_fact;
val it = INT 6 : expr

The most interesting example is the last one, which shows the result of evaluating 3! - and proves that our interpreter is capable of "running" recursive functions.

For this part, you should complete the definition of eval. Follow the rules given in class for the substitution model. A solution that produces the correct output but does not follow the substitution model exactly is not acceptable (i.e. alternative implementations of the interpreter that do not follow the specification provided by the substitution model are not acceptable).

What to Submnit

Make all your changes in file subst-SML.sml. Do not change the signature (type) of any functions.

Hint: We have provided functions getInt, getBool, and getFN for your convenience - use them to simplify the implementation of more complex functions. Make sure that your substitution function works well - it is crucial for the correct functioning of expression evaluations.

Problem 3: Binomial Heaps

A binomial heap is a collection of binomial trees. Each node of a binomial tree contains a special value named key. A binomial tree is an ordered tree (based on key values), and is defined recursively. Binomial trees are not binary trees - a node can have more than two children! Given a size parameter k, the structure of a binomial tree is specified unambiguosly; we refer to any binomial tree with this structure as being a B_k tree. Thus two instances of B_k trees are identical in structure, but they do not necessarily contain the same keys.

There are no empty binomial trees. Tree B₀ consists of a single node. Binomial tree B_k consists of two instances of binomial tree B_k-1 linked so that the root of one instance is the leftmost child of the root of the other instance.

We show below the structure of B_k for k = 0, 1, 2, 3:

  x     x      x            x	     
       /      / \        /  |  \   
      x      x  x       x   x   x  
            /          / \  |	     
           x          x  x  x     
     	             /    	    
     	            x

The directions of the edges linking nodes to their children do not matter; what matters is the number and the order of the children. By convention, we will refer to the leftmost child of a node as being the first child of the respective node.

Binomial tree B_k has height equal to k, it contains 2^k nodes, and its root has degree k (it has k children, the most of all nodes in the respective tree). The rank of B_k is k. Each node of a binomial tree has a key that is strictly smaller than the key of any of its descendants. No binomial tree contains duplicate keys, nor will there be an attempt to insert a duplicate key into the tree.

A binomial heap consists of an ordered list of binomial trees. If tree T1 is to the left of tree T2 in the heap, then the rank of T1 is strictly less than the rank of T2. It is possible for a binomial heap to contain no trees, i.e. to be empty. A binomial heap does not contain duplicate keys, nor will an attempt to insert a duplicate key be made.

A binomial heap containing a total of n nodes will contain exactly m binomial trees, where m is the number of digits 1 in the base-2 representation of n. In addition, the heap will contain a tree of rank r for all values r that are the exponents of 2 that correspond to digits 1 in the binary expansion of m. For example, if a heap contains 11 nodes (1011 in binary), then the heap will contain an instance each of B₀, B₁, and B₃.

We now provide SML definitions for binomial trees and heaps:

(* binomial tree = pair of key and children's list *)
datatype 'a bintree = Node of 'a * ('a bintree) list

(* binomial heap = list of (binomial tree, tree rank) *)
type 'a binheap = ('a bintree * int) list

We note that the datatype constructor Node is not necessary, we could use only a pair of type 'a * ('a bintree) list. The reason for this is that we don't have empty binomial trees. Since binomial heaps are also lists of pairs, we chose to use a datatype constructor to make the various constructs more easily recognizable. The leftmost child of a node is the first in the list of its children.

We have provided function printHeap, which provides an intuitive representation of the binomial heap. Because our heaps and trees are parameterized types, we need to provide a custom printing function that can print the key stored in each node. For binomial trees and heaps based on integers, Int.toString will be the simplest printing function choice.

fun printHeap(h: 'a binheap, prnode: 'a -> string): unit = 
  let
    fun makeString(n: int) = if n = 0 then "" else " " ^ (makeString (n - 1))
    fun printTree(Node(node, children): 'a bintree, n: int) = (
      print ((makeString n) ^ (prnode node) ^ "\n");
      map (fn t => printTree(t, n + 2)) (rev children);
      ())
    fun printSep(n) = print ("------" ^ (Int.toString n) ^ "------\n");
  in
    (map (fn (t, d) => (printSep d; printTree(t, 0))) h; ())
  end

Note: You might find it instructive to understand how function map is used in printHeap.

Here are a few examples of printed binomial heaps:

- val h0 = [(Node(3, []), 0)];
val h0 = [(Node (3,[]),0)] : (int bintree * int) list
- printHeap(h0, Int.toString);
------0------
3
val it = () : unit


- val h1 = [(Node(3, []), 0), (Node(7, [Node(25, [])]), 1)];
val h1 = [(Node (3,[]),0),(Node (7,[Node (25,[])]),1)]
  : (int bintree * int) list
- printHeap(h1, Int.toString);
------0------
3
------1------
7
  25
val it = () : unit


- val h2 = [(Node(3, []), 0), (Node(7, [Node(25, [])]), 1), (Node(15, [Node(28,
[Node(41, [])]), Node(33, [])]), 2)];
val h2 =
  [(Node (3,[]),0),(Node (7,[Node (25,[])]),1),
   (Node (15,[Node (28,[Node (41,[])]),Node (33,[])]),2)]
  : (int bintree * int) list
- printHeap(h2, Int.toString);
------0------
3
------1------
7
  25
------2------
15
  33
  28
    41
val it = () : unit

Each binomial heap is printed by printing its component trees in order, starting with the one with lowest rank. The rank of the tree is printed between several minus signs - the only role of these signs is to make the rank easy to see.

Each binomial tree is indented so that the node higher in the tree appears in the output to the left of nodes lower in the tree. It is probably easiest to read these printouts if you tilt your head to the left. With your head tilted, you will see the root of the tree at the top (leftmost position if your head is not tilted), then its children one line below the root (one line to the right, if your head is not tilted), with the first child in the leftmost position (in other words, the first child, which is the leftmost child in the list of children, is printed last). These observations then extend recursively. For example, the rank-2 binomial tree printed above has the following structure:

If you read the definitions carefully, you realized that the minimum key of each binomial tree is in its root. If we want to find the minimum key in the entire binomial heap, we must find the minimum key among all the keys that are in the roots of the heap's trees. For this part, complete function findMinimum defined below:
```
fun findMinimum(h: int binheap)
```
To simplify the function a bit, we have defined it only on int binheaps; this is not an oversight. This definition avoids the need for a custom comparison function that works on the otherwise unspecified base type of the binomial heap.
Given two heaps, h1 and h2, we define the merge operation on these heaps as generating an ordered list merge(h1, h2) of binomial trees, which contains all the binomial trees that were originally in h1 or h2, and which has the following properties:
- If T1 is before T2 in merge(h1, h2), then rank(T1) <= rank(T2);
- If T1 is before T2 in merge(h1, h2) and rank(T1) = rank(T2), then T1 comes from h1 and T2 comes from h2.

For this part, implement function mergeHeaps defined below:

fun mergeHeaps(h1: 'a binheap, h2: 'a binheap)

The reason why the merger of two heaps does not result in another heap is that in the merged list of binomial trees there might exist adjacent trees with the same rank. If we could get rid of these trees by suitably combining them into bigger trees, then we could create a heap that - from the point of view of its contained keys - is the union of the two original heaps. We will start processing the list resulted from the merger at its head. At each step, we will examine the first two or three elements of the list. If the list has fewer than two elements, the result is a binomial heap. For simplicity, let us denote the first, second and third (if it exists) element of the list L resulted from the merger with T1, T2, and T3, respectively. The following three cases are relevant for our problem:
(U1) If rank(T1) < rank(T2), then T1 is part of the resulting binomial heap. To find the other trees in the result, we now remove T1 from L, and continue processing the remainder of L.
(U2) If rank(T1) = rank(T2) = rank(T3), then T1 is again part of the result, and we can continue with processing L with T1 removed. We know that the next case will be (U3).
(U3) If rank(T1) = rank(T2) and there is no T3, or if it has a rank greater than that of T1 and T2, then we will merge T1 and T2 so that the minimum key in both T1 and T2 remains in the root of the resulting tree. If the key in the root of T1 is less than the key in the root of T2, then the root of T2 will become the leftmost child of T1. Otherwise, T1 will become the leftmost root of T2. Let the resulting tree be TR. We now continue processing the list consisting of TR followed by the remainder of L after we removed both T1 and T2.
For this part, implement function heapUnion defined below:
```
fun heapUnion(h1: 'a binheap, h2: 'a binheap, compare: 'a * 'a -> order)
```
Note that besides the two heaps, a third parameter is needed to provide the generalized comparison function which we need for comparing key values. Hint: you will probably find function heapUnion useful for this part.
For this part, implement function add2Heap, which adds a new key to the given binomial heap:
```
fun add2Heap(k: 'a, h: 'a binheap, compare: 'a * 'a -> order)
```

We provide below a number of test cases that illustrate the behavior of the functions described above:

(* Printing function for integers *)
- fun p i = Int.toString i;
val p = fn : int -> string



(* Comparison function for integers *)
- fun compare(a: int, b: int) =
    if a < b
    then LESS
    else if a > b
         then GREATER
         else EQUAL
val compare = fn : int * int -> order



- findMinimum h2;
val it = 3 : int



- printHeap(mergeHeaps([(Node (2,[]),0),(Node (1,[Node (7,[])]),1)], []), p);
------0------
2
------1------
1
  7
val it = () : unit



- printHeap( mergeHeaps([(Node (2,[]),0),(Node (1,[Node (7,[])]),1)],
                        [(Node (20,[]),0),(Node (12,[Node (15,[])]),1)]), p);
------0------
2
------0------
20
------1------
1
  7
------1------
12
  15
val it = () : unit



- printHeap(add2Heap(41, [(Node (2,[]),0),(Node (1,[Node (7,[])]),1)], compare), p);
------2------
1
  7
  2
    41



- fun buildHeap(l: int list) =
    foldl (fn (k, h) => add2Heap(k, h, compare)) (makeHeap()) l;
val buildHeap = fn : int list -> int binheap



- printHeap(buildHeap [7, 1, 2, 3, 8, 9], Int.toString);
------1------
8
  9
------2------
1
  7
  2
    3
val it = () : unit

What to Submit

Modify file binomial.sml making sure that you do not change the signature of any function. For this problem, submit this file only through CMS.