Set and map interfaces

(* THERE'S NO IMPLEMENTATION OF THIS *) signature COLLECTION = sig type 'a collection (* These functions are nearly always present*) val empty: 'a collection (* what parameters might this have? *) val map: ('a -> 'b) -> 'a collection -> 'b collection val app: ('a -> unit) -> 'a collection -> unit (* If the collection is ordered, there may be foldl and foldr *) val fold: (('a * 'b) -> 'b) -> 'b -> 'a collection -> 'b (* These functions are sometimes present *) val size: 'a collection -> int val exists: ('a -> bool) -> 'a collection -> bool val find: ('a -> bool) -> 'a collection -> 'a option val filter: ('a -> bool) -> 'a collection -> 'a collection end

When writing your own collection types, you ought to provide functions like this for the user. When looking to use a collection, you ought to try to use these functions before writing your own equivalent code.

Sets and maps are two common abstract data types; let's look at signatures for each of these, and how the two types relate.

signature SET = sig type 'a set val add: ('a set * 'a) -> 'a set val isMemberOf: ('a set * 'a) -> bool end

A set might also include other operations, such as
union, intersection, difference, isSubset, singleton
and the collection operations above, but all can be derived from
`add`

and `isMemberOf`

(and `empty`

).

signature MAP = sig type ('key, 'value) map val insert: (('key, 'value) map * 'key * 'value) -> ('key, 'value) map val find: (('key, 'value) map * 'key) -> 'value option end

A map might also have an operation to remove a binding. Maps
will probably also include general collection functions - `map`

,
`fold`

, etc, but one wrinkle with these is that when iterating
over the map, one might want to look at all keys, all values, or all
key, value pairs. Different map interfaces will provide one or more of
these options.

It is actually possible to implement sets in terms of maps, and maps in terms of sets. Consider the following:

(* sets in terms of maps *) val add (set, item) = insert (map, item, 1) (* or true or item or whatever *) val isMemberOf (set, item) = case find (map, item) of NONE -> false | SOME _ -> true (* maps in terms of sets - very mathematical *) val insert (map, key, value) = add set (key,value) val find (map, key) = member set (key, _)

Note that to make the latter implementation work, we must implement the set so that when it compares entries, it only compares the key part of each key, value pair.

Any implementation of a set or map depends on a way of comparing elements (or in the case of a map, comparing keys). What sort of comparison is required depends on the implementation. Let's consider implementations of a map. For an association list (see Recitation 8), only a notion of equality is needed. For a hashtable, we need both a notion of equality as well as a hash function from our key type to integers. Finally, for a tree-based implementation, we need a comparison function returning a total ordering.

Programming in a purely functional or applicative manner requires the use of immutable data structures. The signatures we have provided so far assume this paradigm. It is also possible to have mutable collections, such as hashtables. In that case, a signature will look more like this:

signature MUTABLE_MAP = sig type ('key, 'value) map val insert: (('key, 'value) map * 'key * 'value) -> unit val find: (('key, 'value) map * 'key) -> 'value option end

That is, the insert function, rather than returning a new, immutable map with the new key, instead modifies the existing map. It is often possible to gain an efficiency improvement by using mutable structures - hash tables offer O(1) access and update, assuming the hash function is good. On the other hand, mutable structures are more complex to reason about, both theoretically and practically.

The preceding discussion has all been abstract, but now we will talk about
the actual set and map structures available in SML. SML proper does not
have any such structures, but SML of New Jersey does. Most of the functions
are available through the use of *functors*. A functor is a
module that takes a module as an argument and produces a module as output.
A functor is to a module as a function is to a value. For now, all you
need to know is the syntax for this.

structure ListSetOfInts = ListSetFn (struct type ord_key = int val compare = Int.compare end)

What's going on here is that we are defining an (anonymous) structure
containing one type (ord_key) and one value (compare), and passing that
structure as an argument to the `ListSetFn`

functor. The return value
is a module having the signature `ORD_MAP`

, which we
give the name ListSetOfInts. The argument of
`ListSetFn`

must conform to the
signature `ORD_KEY`

.

SML/NJ offers a number of functors returning sets and maps: RedBlackSetFn, BinarySetFn, ListSetFn, SplaySetFn, RedBlackMapFn, BinaryMapFn, ListMapFn, SplayMapFn. Red-Black trees offer an efficient functional implementation based on balanced binary trees; we will discuss them later in the course.

Docmentation for most of these functors is available at http://www.smlnj.org/doc/smlnj-lib/Manual/toc.html, but this documentation is not quite up to date (it doesn't include the red-black trees, for example). You can easily look up the actual code and interfaces for these modules in your sml distribution. On Windows it's probably at C:\sml\src\smlnj-lib\Util\*.sml, and on Mac it's at SMLNJ/smlnj-lib/Util/*.sml, whereever you put the SML/NJ installation.

One thing to note about these structures is that unlike the examples given so far, they are *not* polymorphic in the type of their key (or item, for sets). Rather, the functor's argument specifies the type. A functorized analogue of the collection signature we began with might look like this:

funsig COLLECTION_FUNCTOR (structure Key: sig type key val equals: key*key -> bool end) = sig type collection val empty: collection val map: (Key.key -> Key.key) -> collection -> collection val app: (Key.key -> unit) -> collection -> unit (* If the collection is ordered, there may be foldl and foldr. * Key would need to have an operation "compare: key*key->order" in that * case. *) val fold: ((Key.key * 'b) -> 'b) -> 'b -> collection -> 'b (* Operations like these are often useful. *) val size: collection -> int val exists: (Key.key -> bool) -> collection -> bool val find: (Key.key -> bool) -> collection -> Key.key option val filter: (Key.key -> bool) -> collection -> collection end

This has the advantage that collections don't have to keep track of their own key operations. Also, if you have two collections of the same type, you know that they have the same underlying key comparison function, which helps avoid mistakes. For map abstractions, it is also possible to have a map that is polymorphic with respect to the value type but for which the key type is passed as a functor parameter. Unfortunately, SML makes you decide which style you want. But you're still better off with any of these styles than in many languages.

SML/NJ also contains hash tables. These are *not* functorized,
but instead the equivalent of the `empty`

function takes
arguments which specify the comparison functions

(* raised when I do a lookup and can't find something *) exception Can't_Find_It (* used to convert a string to a hash code -- i.e., word *) val hash_fn : string->word = HashString.hashString (* used to compare keys in the hash table *) fun cmp_fn : string*string->bool = (op =) (* initial table size -- need something relatively prime and * approximately the size of the number of elements you expect * to be in the hash table. Note that the implementation resizes * as needed to avoid major collisions. *) val initial_size : int = 101 val t : (string,int) table = mkTable (hash_fn, cmp_fn) (initial_size, Can't_Find_it) insert t ("Eric",30); insert t ("Jason",24); lookup t "Jason" (* returns 30 *) lookup t "Daniel" (* raises Can't_Find_It exception *)