CS 312 Problem Set 3
Specification and Testing

Issued: February 17, 2004
Due: March 2, 11:59pm


Instructions

You will do this problem set by modifying the source files in ps3.zip and submitting the program that results. The program that you submit must compile without any warnings. Programs that do not compile or compile with warnings will receive an automatic zero.

The goal of this problem set is not to make you implement difficult algorithms, but rather to produce a high-quality implementation of a data abstraction. We are particularly interested in seeing you write concise, clear specifications and other code documentation. It is not enough that your code "works". We will be looking for code that is easy to understand and maintain. The run-time efficiency of your implementations will carry some weight but will be a secondary consideration. Further, we want you to carefully test your code using a well-defined testing strategy.

The data abstractions you implement in this problem set will be used in a later problem set. Thus, there is even greater value in doing a good job on this problem set!

Some parts of the assignment ask you to justify design decisions in a way that would not be appropriate for production code; place such justifications in a comment that reads:

(* Design justification: <your justification here> *)

If it is necessary to change the problem set specification, add a comment that reads:

(* Specification change: <description and justification here> *)

It should not be necessary to do this. We recommend talking to the course staff if you think you need to.

This problem set is divided into two parts, each of them involving a different data abstraction and each of them containing a number of subparts. 

Part 1: Big Numbers

The native integer type (in SML/NJ) has a size of 31 bits, which means it can represent integers between -230 = ~1073741824 and 230-1 = 1073741823 (respectively given by the values Int.minInt and Int.maxInt). In this first part of the problem set, we will write code that allows SML to handle arbitrarily large integers (well, as large as the memory allows). These are known as bignums.

Our approach will be to define an abstract type for bignums along with the standard operations plus, minus, times, comparison operators, among others.

In this part, you will finish defining a data abstraction for big numbers and implement a module that satisfies this abstraction.

1) Specifications

Before you begin implementing the data abstraction for big numbers, you need to finish its specifications. The existing specifications may be incomplete, incorrect, or nonexistent. If you find it desirable to change the specification of a function in a way that is not a refinement of the given specification, include an additional comment justifying your design change. Think carefully about what would be a good design specification for each function from the standpoint of both user and implementer.

To do: Correct and complete the specifications in the file bignums.sig.

2) Implementation

For this part of the problem set you will modify the file bignums.sml.

We will implement a structure BigNums that matches the signature BIGNUMS. In this implementation, an integer n will be represented by the coefficients of the expansion of n in some base. For example, suppose the base is 1000. Then the number 123456789 can be written as:

(123 * 10002) + (456 * 10001) + (789 * 10000),

which we will represent by the list [789,456,123]. Notice that the least-significant coefficient appears first in the list. This will make some operations on bignums easier.

For another example, consider the number 12000000000000, which is represented by the list [0,0,0,0,12]. Clearly, the initial 0's are important, as [12] is the representation for 12, [0,12] is the representation for 12000, [0,0,12] is the representation for 12000000, and so on.

Here is the concrete type definition for bignums:

type bignum = {neg : bool, coeffs : int list}

The neg field specifies whether the number is positive (if neg is false) or negative (if neg is true). The coeffs field is the list of coefficient in some base, where each coefficient is between 0 and base-1, inclusive. Assuming a base of 1000, to represent the integer 123456789, we would use:

{neg=false, coeffs=[789,456,123]}

and to represent −9999 we would use:

{neg=true, coeffs=[999,9]}

The base used by your bignum implementation is defined in the file bignums.sml itself.

val base: int = 1000

You may assume that the base is one of 10, 100, 1000, or 10000. Regardless of how base is defined, your implementation should give exactly the same answers as far as the user can tell; that is, the base should be hidden by the abstraction barrier.

  1. Along with the representation type of bignums, include a comment giving the abstraction function and representation invariant for big numbers.

  2. Moreover, you should implement the function repOK. This function is just the identity function for any big number that satisfies the representation invariant; for representations that do not satisfy the invariant, repOK raises Fail.
  3. val repOK : bigNum -> bigNum
  4. Converters. We have provided a function to convert a string to a bigNum (the function fromString). Implement the opposite conversion, namely taking a bigNum and returning a string. A negative number should be prefixed by "~". Be sure to eliminate leading zeros; thus, 1234 should be converted to "1234", not "001234". This function will allow you to view the results of computations. 

    Hint: The library function StringCvt.padLeft can be handy for this problem.

  5. Comparison operators. Implement the functions equal and less that compare two numbers b1 and b2 and return a boolean indicating whether b1 is equal to or less than b2, respectively. These operations must use repOK to check that their arguments satisfy the rep invariant.

  6. Arithmetic operators. Implement the functions negate, plus, and times (negation, addition, and multiplication). Use the traditional algorithms you learned in grade school, adapted to an arbitrary base. The aim is correctness rather than efficiency. Keep your code simple. Make sure your code works with both positive numbers, negative numbers, and zero. Each operation must check its argument or arguments using repOK.

  7. We have implemented a calculator interface so that you can test your implementation of big numbers. After compiling the source code using the CM.make() command, you can start the calculator by typing Calc.start() in the command line (do not forget to use semicolon in the end!). The calculator accepts expressions using +, , *, (, ), ! (factorial), where the binary operators are infix. For example, you can type:

    Calculator> (1000000034 - 1100000) - 10!
    
    To terminate the calculator, type quit at the command line. Notice that the calculator uses your implemented function toString, so make sure it works. If you are having problems with it, come to consulting hours.

    To do: Extend the calculator to support the exponentiation operator (e.g., 2^3 = 8), making the necessary changes to the file calc.sml.

Part 2: Tries

The trie (pronounced "try") is a useful tree data structure for implementing the dictionary abstract data type (ADT), for encoding and compression and in regular expression search and approximate string matching. The trie is one of many tree abstractions that we have not yet covered in this class; however, you should be comfortable with picking up new data structures and applying them to problems. A binary search tree stores keys and data in the nodes, whereas a trie stores keys along the edges. The key is always a string, and the key for any given node in a trie can be found by tracing the path from the root of the tree and concatenating all of the characters encountered along the edges on that path. Here is an example of a trie:

There are five elements in this tree; the root does not store any information.  The trie represents these mappings: (0, A), (1, B), (00, C), (01, D), and (010, E).

In a simple trie each edge stores one character from an alphabet. An alphabet is any set of characters that can be sequenced to form a string. By convention we call the alphabet of the tree S (sigma). The Roman alphabet (S={A,B,C,...}) is one example of an alphabet, but the example above uses the alphabet S={0,1}; this is an alphabet that can be used to write all binary numbers.

For this exercise, we will consider a trie where S is the entire Extended ASCII character set.  In other words, any node can have up to 256 edges going out of it.  Each element of the alphabet is a Word8.word, which is really just an unsigned integer. You do not need to use any functions from the Word8 structure, but it does not hurt to familiarize yourself with it. One important feature is that equality is defined on words.

In this part, you will finish defining a data abstraction for tries, implement a module that satisfies this abstraction, and test your module.

1) Specifications

Before you begin implementing the data abstraction for tries, you need to finish its specifications. The existing specifications may be incomplete, incorrect, or nonexistent. If you find it desirable to change the specification of a function in a way that is not a refinement of the given specification, include an additional comment justifying your design change. Think carefully about what would be a good design specification for each function from the standpoint of both user and implementer.

To do: Correct and complete the specifications in the file tries.sig.

2) Implementation

For this part of the problem set you will modify the file tries.sml We will implement a structure Tries that matches the signature TRIES.

  1. Representation. Your first task is to decide how you will represent a trie. You should consider what functions are likely the most important and select a representation that will result in simple and reasonably efficient code. Add a design justification explaining your reasoning.
  2. type trie = your implementation here
  3. Next to the representation type write a comment giving the abstraction function and representation invariant for tries.

  4. Moreover, you should implement the function repOK. This function is just the identity function for any trie that satisfies the representation invariant; for representations that do not satisfy the invariant, repOK raises Fail.
  5. val repOK : trie -> trie
  6. Basic operations. Implement the empty value, which is a new trie containing all single-word strings mapped to their corresponding codes (0..255). Also implement the insert function, which allows a new mapping to be added to a trie. If the value to be inserted is associated with a list of words that was already inserted in the trie, the trie doesn't get changed and the returned value should be a tuple consisting of the unmodified trie and NONE. This means that, given a list of words of length n, where the prefix (first n-1 words) is already in the trie, the last element of the list will be inserted in the trie if it is not already there. Your code should raise the exception InvalidInsertion if the prefix of the given list of words is not in the trie or if thegiven list is empty.
    val empty: trie
    val insert: trie * Word8.word list * value -> trie * cursor option

    These two functions are very important so that the course staff can test your other trie functions. Therefore, be very careful to implement them correctly.

  7. Search operations. Implement the function lookup, which returns the child of the trie's root whose edge is labeled with a given word, and function search, which returns the child of a specific node whose edge is labeled with a given word. Function search should return NONE if a child connected by a label with the given word is not found. Function lookup should raise Fail if a child with the given word is not found.
    val lookup: trie * Word8.word -> cursor
    val search: cursor * Word8.word -> cursor option
  8. Implement the function codeof that returns the value of a specific node.

    val codeof: cursor -> value
  9. Implement the function size that returns the number of elements in a trie.

    val size: trie -> int
    

3) Add your own operation

In the previous part, you implemented a number of functions for manipulating tries. These are not the only operations that one might like to have available for use on tries. Think of another useful operation that is worth adding to the trie interface, add its specification to tries.sig along with a justification for your choice, and implement it in tries.sml.

4) Black-box testing

For this problem, you will be creating test cases that test the functionality of tries. The signature testtries.sig looks as follows:

signature TESTTRIES = sig

  datatype fn_tests = RepOKTR | EmptyTR | InsertTR | LookupTR |
		      SearchTR | CodeofTR | SizeTR

  datatype test_result = PASS of fn_tests | FAIL of (fn_tests * string)

  type test = unit -> test_result
  type test_generator = unit -> test list

  (* Returns: all 20-25 tests *)
  val allTests : test_generator

  (* Returns: five good tests, at most one per testable function *)
  val hardTests : test_generator

  (* Effects: Prints out the results from all tests in the list. *)
  val printTests : test_result list -> unit

  (* Performs unit testing on the multiset ADT and returns a list of
   * results from the tests returned by the test_generator. *)
  val runTests : test_generator -> test_result list
end

In testtries.sml, add functions of the type unit->test_result. Each function should test a different aspect of a operation on tries.

Each testing function returns a test_result that indicates whether the test passed or failed and which function it was testing. If the test failed the test result contains a string with details about how the test failed. This string should be one of the following strings:

  1. RI violating tree was validated
  2. RI non-violating tree was not validated
  3. Incorrect tree returned
  4. Value not inserted
  5. Value inserted in wrong position
  6. Non-existing value found
  7. Existing value not found
  8. Wrong node returned
  9. Wrong value returned
  10. Wrong size returned
  11. Error

Use the Error string only if you think all the others don't apply. Be careful to type in the exact strings as provided above.

We have already implemented printTests, so you can view the results of your test cases by executing printTests(runTests allTests).

While you should be testing all of your functions (and note that we will test all of your functions ourselves), you should only submit test cases for insert, lookup, search, codeof, and size. Create 20-25 test cases that test these 5 functions and implement allTests() in testms.sml that returns them. You should test both basic functionality and corner cases. Make sure you do not include any test cases for the operation you created in question 3.

We will run some of your test cases against some other buggy implementations of tries to see how good your test cases are. Implement hardTests() so it returns a list of these test cases. The list should contain exactly five tests, one for each of the five functions empty, insert, search, codeof, and size. Remember that you can only interact with tries through the signature. You cannot assume anything about the underlying implementation; therefore, you should write black-box tests.

Part 3: Written problem

Include your answer to this problem in the file written.txt.

  1. Consider the following code:
    fun f a b c d =
      case (b, d) of
        ( _, (a,0))            => (#2 b) (a (c-1)) c
      | ((a,b,_), (c,1))       => b (op+ (a,1)) 1
      | ((x, y, ~312), (_, a)) => a * (y a x)
      | _ => 2 * c * 2
    What is the type of the function f? Justify your answer. (Hint: Watch out for shadowing!)
  2. Consider the following six possible specifications for a function
    val f: int->int
    1. f(x) >= 0
    2. f(x) >= 0. Requires: x >= 0
    3. f(x) is the square root of x. Requires: x is a perfect square
    4. f(x) >= 0 and f(x) <= x. Requires: x >= 0
    5. f(x) >= 0. Requires: x is odd
    6. f(x) >= 5. Requires: x is odd

    Draw a diagram that describes the refinement relation among these specifications. The diagram should contain the names A through F. If specification 1 is a refinement of specification 2, draw a line between the two corresponding letters, and make sure that the name of specification 2 is higher in the diagram than that of Spec 1. For example, the diagram should contain at least the following arc:

         B 
         |
         A 
    

    A line between 1 and 2 may be omitted if there is some other specification in the diagram that is a refinement of spec 2 and that spec 1 refines.

To do: Hand in the solutions to these problems in a file written.txt. These must be in ASCII text format, and all lines should be shorter than 78 characters.