Problem Set 3: Tries and Boggle

Due Monday 10/5/09 11:59pm


Part 1 : Substitution Model

Find an expression that, when substituted for the ??, will cause the following function for the given input to evaluate to 42. Show the evaluation using the substitution model. You may omit minor substitutions, but be sure to show all the main steps.  At an absolute minimum, show each step before and after a recursive call. Steps that create new global bindings should also be shown.

let initial x = 0 in
let rec zardoz (f1, lst) =
  match lst with
    [] -> 3110
  | h :: t ->
      let f x = h * List.length lst in 
      if List.length t = 0 then f1 0
      else zardoz (f, t) + f 0
in
  zardoz (initial, ??)  

To do: Turn in the solution under SubstitionModel in CMS. Submission is allowed in .pdf, .doc, or .txt. Please include your name and netid in your solution.


Part 2: Tries

Several algorithms, including compression algorithms and string matching algorithms, make use of maps where the keys are sequences of values of a certain type. A trie or prefix tree is an efficient data structure for implementing the map ADT for this kind of map. The set of elements from which key sequences are built is called the alphabet of the trie.

A trie is a tree structure whose edges are labeled with the elements of the alphabet. Each node can have up to N children, where N is the size of the alphabet. Each node corresponds to the sequence of elements traversed on the path from the root to that node. The map binds a sequence s to a value v by storing the value v in the node corresponding to s. The advantage of this scheme is that it efficiently represents common prefixes, and lookups or inserts are achieved using quick searches starting from the root.

Suppose we want to build a map from strings of ASCII characters to integer codes. In this case, edges are labeled with ASCII characters, and integer codes are stored in (some of) the nodes. Here is an example:

The above trie describes the map {"to"->7, "tea"->3, "ten"->12, "in"->5, "inn"->9}. The values (shown in blue) are stored in the nodes. Note that not all nodes have values. For instance, the node that corresponds to string "te" has no value. Hence, "te" is not in the map.

A cursor is a data structure that points to a specific node in a trie. It is useful to define a cursor pointing into a trie when performing lookups on strings in order to advance through the trie character by character.

In this part you will implement tries that map strings of characters (char list) to values of arbitrary types ('a). Your task is to do the following:

  1. Interface specifications. Before you begin implementing the trie data structure, you need to finish its specifications in the signature TRIE. The existing specifications may be incomplete, incorrect, or nonexistent.

  2. Representation. You must decide how you will represent a trie. Briefly inspect the functions below to decide if certain pieces of information would make their implementation simpler. Then define an appropriate trie type.

    type 'a trie = (* your implementation here *)
    
  3. Next to the representation type, write comments giving the abstraction function and representation invariant for tries.

  4. Implement the function repOK. If the input trie satisfies the representation invariant, then repOK simply returns the input trie.  Otherwise, repOK raises Failure.

  5. Trie operations. Implement the empty value, representing an empty trie, and the functions put, get, size. The put function adds a new mapping to a trie. If the value to be inserted is associated with a key string that is already bound to another value, then the code should raise an exception ElementExists. The function get returns the value bound to the given key string in the trie. If the string does not exist in the trie, or if it exists in the trie but is not bound to any value, then an exception NotFound must be raised. Finally, the function size returns the size of a trie.

    val empty: 'a trie
    val put: 'a trie -> char list -> 'a -> 'a trie
    val get: 'a trie -> char list -> 'a
    val size: 'a trie -> int
    

    Note: the functions that build new tries are important since the course staff will use them to generate the inputs for testing other functions. Therefore, be careful to implement them correctly.

  6. Cursor support. Next, define an appropriate type to describe a cursor for a trie structure. Make sure the type contains enough information to support the cursor operations defined below.

    type 'a cursor = (* your implementation here *)
    

    Implement the following functions:

    val cursor: 'a trie -> char list -> 'a cursor
    val advance: 'a cursor -> char -> 'a cursor option
    val getc: 'a cursor -> 'a option
    

    The function cursor returns a cursor positioned at the node that corresponds to the string given as an argument. If the string doesn't exist, the exception NotFound is raised. The function advance moves the cursor one position down the tree, following the edge that corresponds to the value passed as an argument. This function returns None if no such edge exists. The function getc yields the value stored at the node that the cursor points to.

To do: Turn in the solution in the files trie.mli and trie.ml.


Part 3: Solving Boggle

Boggle is a word game in which players try to make words by connecting letters on adjacent tiles. A traditional Boggle board is a 4 by 4 grid, with 16 cubic dice that have letters on each side. To start the game, a player first shakes up the dice and allows them to settle on the grid, then a three-minute timer for finding words begins. The rules for word-finding are as follows:

  1. Words must be at least three letters long.
  2. No proper nouns are allowed.
  3. The same tile cannot be used more than once in the same word.

Other than that, any word composed of adjacent tiles in sequence is fair game (diagonal DOES count as adjacent).

At the end of three minutes, each player has a list of words, and everybody reads off their words. If a word is found by more than one person, everyone must cross it off of their list. At the end, the remaining words are scored based on length, and the player with the most points wins.

Your Task

Your task in this problem set is to implement an efficient solver for the game of Boggle. Given a board, your program should find all legal words on the board. It will be particularly important in this exercise to choose data structures well-suited to the task.

We will provide you a file dictionary.txt that contains all of the words we want you to search for. You will need to eliminate the words that are illegal in Boggle, namely the proper nouns (capitalized in the dictionary) and words of less than three letters. You will need to complete the module dictionary.ml in which you read the dictionary into a data structure of your choosing. We have given you a few function headers in dictionary.ml to guide you in building the dictionary.

We will also provide a Board module that reads in a board from a file and gives constant time access to the letter stored at each tile, along with a few other utility functions. In the Board module, you will need to implement unvisited_neighbors. Though the actual Boggle game has a single tile with Qu on it, for simplicity, we will leave such a tile out of this solver.

In the file boggle.ml, you will implement two functions: find_words and solve. The function find_words returns a list of all words that begin with a certain tile, specified by its x and y coordinates. The function solve takes in a string specifying the board file, a string specifying a dictionary file, and a string specifying the output file, and writes out all of the legal boggle words in lexicographic order and without duplicates to the specified file. To read in the dictionary and to print your list of words, you will need to use file I/O. A nice tutorial on reading and writing files can be found here. You can also check out the OCaml documentation here. All of the provided files can be found on CMS.

Good luck and get started early!

To do: Turn in the solution in the files boggle.ml, dictionary.ml, and board.ml.


Karma: Sorting with Tries

Use the trie you wrote to implement a function sort: string list -> string list that sorts a list of strings in lexicographical order. It must have O(n) complexity, where n is the sum of the lengths of the strings (assuming the alphabet is of constant size).