Recitation 9
Set and map interfaces

Collections

We have now seen several types of collections - lists, queues, stacks, maps, and various sorts of trees. You will find most data structures in OCaml share something like the following interface:
(* THERE'S NO IMPLEMENTATION OF THIS *)
module type COLLECTION =
sig
  type 'a collection

  (* These functions are nearly always present*)
  val empty: 'a collection (* what parameters might this have? *)
  val map: ('a -> 'b) -> 'a collection -> 'b collection
  val app: ('a -> unit) -> 'a collection -> unit
  (* If the collection is ordered, there may be foldl and foldr *)
  val fold: ('a -> 'b -> 'a) -> 'a -> 'b collection -> 'a

  (* These functions are sometimes present *)
  val size: 'a collection -> int
  val exists: ('a -> bool) -> 'a collection -> bool
  val find: ('a -> bool) -> 'a collection -> 'a option
  val filter: ('a -> bool) -> 'a collection -> 'a collection
end

When writing your own collection types, you ought to provide functions like this for the user. When looking to use a collection, you ought to try to use these functions before writing your own equivalent code.

Sets and maps are two common abstract data types; let's look at signatures for each of these, and how the two types relate.

module type SET = sig
  type 'a set
  val add: ('a set * 'a) -> 'a set
  val isMemberOf: ('a set * 'a) -> bool
end

A set might also include other operations, such as union, intersection, difference, isSubset, singleton and the collection operations above, but all can be derived from add and isMemberOf (and empty).

module type MAP = sig
  type ('key, 'value) map
  val insert: (('key, 'value) map * 'key * 'value) -> ('key, 'value) map
  val find: (('key, 'value) map * 'key) -> 'value option
end

A map might also have an operation to remove a binding. Maps will probably also include general collection functions - map, fold, etc, but one wrinkle with these is that when iterating over the map, one might want to look at all keys, all values, or all key, value pairs. Different map interfaces will provide one or more of these options.

It is actually possible to implement sets in terms of maps, and maps in terms of sets. Consider the following:

(* sets in terms of maps *)
let add (set, item) = insert (map, item, 1) (* or true or item or whatever *)
let isMemberOf (set, item) =
  match find (map, item) with
    None -> false
  | Some _ -> true

(* maps in terms of sets - very mathematical *)
let insert (map, key, value) = add set (key,value)
let find (map, key) = member set (key, _)

Note that to make the latter implementation work, we must implement the set so that when it compares entries, it only compares the key part of each key, value pair.

Comparison functions

Any implementation of a set or map depends on a way of comparing elements (or in the case of a map, comparing keys). What sort of comparison is required depends on the implementation. Let's consider implementations of a map. For an association list (see Recitation 8), only a notion of equality is needed. For a hashtable, we need both a notion of equality as well as a hash function from our key type to integers. Finally, for a tree-based implementation, we need a comparison function returning a total ordering.

Mutability

Programming in a purely functional or applicative manner requires the use of immutable data structures. The signatures we have provided so far assume this paradigm. It is also possible to have mutable collections, such as hashtables. In that case, a signature will look more like this:

module MUTABLE_MAP = sig
  type ('key, 'value) map
  val insert: (('key, 'value) map * 'key * 'value) -> unit
  val find: (('key, 'value) map * 'key) -> 'value option
end

That is, the insert function, rather than returning a new, immutable map with the new key, instead modifies the existing map. It is often possible to gain an efficiency improvement by using mutable structures - hash tables offer O(1) access and update, assuming the hash function is good. On the other hand, mutable structures are more complex to reason about, both theoretically and practically.

Sets and maps in OCaml

The preceding discussion has all been abstract, but now we will talk about the actual set and map structures available in OCaml. Most of the functionality are available through the use of functors. A functor is a module that takes another module as an argument and produces a module as output. A functor is to a module as a function is to a value. For now, all you need to know is the syntax for this.

module SetOfInts = Set.Make (struct
  type t = int
  let compare = compare
end)

What's going on here is that we are defining an (anonymous) structure containing one type (t) and one value (compare), and passing that structure as an argument to the Make functor, which is itself contained in the Set module. The return value is a module having the signature Set.S, which we give the name SetOfInts. The argument of Set.Make must conform to the signature Set.OrderedType.

The most useful OCaml functors returning sets and maps are Map.Make, Set.Make, and Hashtbl.Make. These offer an efficient functional implementation based on balanced binary trees; we will discuss trees like those they use later in the course.

Docmentation for most of these functors (and all the other standard library modules) is available at http://caml.inria.fr/pub/docs/manual-ocaml/manual034.html. You can easily look up the actual code and interfaces for these modules in your OCaml distribution. The files should be in the stdlib subdirectory of your OCaml distribution, wherever you put the OCaml installation.

One thing to note about these structures is that unlike the examples given so far, they are *not* polymorphic in the type of their key (or item, for sets). Rather, the functor's argument specifies the type. A functorized analogue of the collection signature we began with might look like this:

module type Key = sig
  type key
  val equals: key * key -> bool
end

module type COLLECTION = sig
  type key

  type collection

  val empty: collection

  val map: (key -> key) -> collection -> collection
  val app: (key -> unit) -> collection -> unit

  (* If the collection is ordered, there may be foldl and foldr.
   * Key would need to have an operation "compare: key*key->order" in that
   * case.
   *)
  val fold: ((key * 'b) -> 'b) -> 'b -> collection -> 'b

  (* Operations like these are often useful. *)
  val size: collection -> int
  val exists: (key -> bool) -> collection -> bool
  val find: (key -> bool) -> collection -> key option
  val filter: (key -> bool) -> collection -> collection
end
module COLLECTION_FUNCTOR (K : Key) : COLLECTION = struct
  type key = K.key
  ...
end

This has the advantage that collections don't have to keep track of their own key operations. Also, if you have two collections of the same type, you know that they have the same underlying key comparison function, which helps avoid mistakes. For map abstractions, it is also possible to have a map that is polymorphic with respect to the value type but for which the key type is passed as a functor parameter. Unfortunately, OCaml makes you decide which style you want. But you're still better off with any of these styles than in many languages.

Hash tables

OCaml also contains hash tables, implemented by the Hashtbl module. OCaml provides both a generic version of hash tables and a functorial version. The generic version uses the "magical" built-in Hashtbl.hash function, which may or may not be suitable for the data you are using. The functorial version requires you to provide your own equal and hash functions, which allows you to specificy a hash function that you know will do well for your data.

Note that all the OCaml hash table structures are imperative. Updates overwrite the old data structure.

(* Create an empty hash table with initial size 101
 * The initial size should be around the expected number
 * of elements you will add for best performance, but it
 * will be automatically increased if it is too low
 *)
let h = Hashtbl.create 101;;

Hashtbl.add h "John Doe" "104 East Street";;
Hashtbl.add h "Jane Smith" "578 Parker Road";;
Hashtbl.find "John Doe";;
- : string = "104 East Street"
Hashtbl.find "Mary Sue";;
Exception: Not_found.
(* Pass in a module for the type we're hashing
 * In this case, it's integers
 * And our hash function is just the integer itself
 *)
module H = Hashtbl.Make(struct
  type t = int
  let equal n1 n2 = (n1=n2)
  let hash n = n
end)

let h = H.create(101);;
H.add h 42 "Answer to Life, the Universe, and Everything";;
H.add h 1729 "Hardy-Ramanujan number";;
H.find h 42;;
- : string = "Answer to Life, the Universe, and Everything"
H.find h 54;;
Exception: Not_found.