# Functors
* * *
*
Topics:
* sets represented with lists
* code reuse
* includes
* functors
* the standard library `Map` module
*
* * *
In the previous lecture we began studying the OCaml module system,
and we saw how it supports namespaces and abstraction. Today we'll
look at how it supports code reuse, so that code need not be copied
or rewritten.
## Sets
As a running example, let's consider a data structure for representing
sets:
```
module type Set = sig
type 'a t
(* [empty] is the empty set *)
val empty : 'a t
(* [mem x s] holds iff [x] is an element of [s] *)
val mem : 'a -> 'a t -> bool
(* [add x s] is the set [s] unioned with the set containing exactly [x] *)
val add : 'a -> 'a t -> 'a t
(* [elts s] is a list containing the elements of [s]. No guarantee
* is made about the ordering of that list. *)
val elts : 'a t -> 'a list
end
```
There are many other operations a set data structure might be expected to
support, but these will suffice for now.
Here's an implementation of that interface:
```
module ListSetNoDups : Set = struct
type 'a t = 'a list
let empty = []
let mem = List.mem
let add x s = if mem x s then s else x::s
let elts s = s
end
```
Note how `add` ensures that the representation never contains any duplicates,
so the implementation of `elts` is quite easy. Of course, that makes the
implementation of `add` linear time, which is not ideal. But if we want
high-performance sets, lists are not the right representation type anyway;
there are much better data structures for sets, and you might see some
later in this course or in an algorithms course.
Here's a second implementation:
```
module ListSetDups : Set = struct
type 'a t = 'a list
let empty = []
let mem = List.mem
let add x s = x::s
let elts s = List.sort_uniq Pervasives.compare s
end
```
In that implementation, the `add` operation is now constant time, and
the `elts` operation is logarithmic time. For some (most?) workloads, this
is a better choice than `ListSetNoDups`, especially if `add` is
likely to be called more often than `elts`, and if sets aren't expected
to contain many duplicates.
## Includes
Suppose we wanted to add a function `of_list : 'a list -> 'a t` to the
`ListSetDups` module that could construct a set out of a list. If we
had access to the source code of both `ListSetDups` and `Set`, and if we
were permitted to modify it, this wouldn't be hard. But what if they
were third-party libraries for which we didn't have source code?
In CS 2110, you will have learned about extending classes and inheriting
methods of a superclass. Those object-oriented language features
provide (among many other things) the ability to reuse code. For
example, a subclass includes all the methods of its superclasses, though
some might by overridden.
OCaml provides a language features called *includes* that also enables
code reuse. This feature is similar to the object-oriented example we just gave: it
enables a structure to include all the values defined by another
structure, or a signature to include all the names declared by another
signature.
We can use includes to solve the problem of adding `of_list` to `ListSetDups`:
```
module ListSetDupsExtended = struct
include ListSetDups
let of_list lst = List.fold_right add lst empty
end
```
This code says that `ListSetDupsExtended` is a structure containing all
the definitions of the `ListSetDups` structure, as well as a definition
of `of_list`. We don't have to know the source code implementing `ListSetDups`
to make this happen. (You might wonder we why can't simply write
`let of_list lst = lst`. See the section on the semantics of includes,
below, for the answer.)
If we want to provide a new implementation of one of the included
functions we could do that too:
```
module ListSetDupsExtended = struct
include ListSetDups
let of_list lst = List.fold_right add lst empty
let rec elts = function
| [] -> []
| h::t -> if mem h t then elts' t else h::(elts' t)
end
```
But that's actually a less efficient implementation of
`elts`, so we probably shouldn't do that for real.
One misconception to watch out for is that the above example
does not *replace* the original implementation of `elts`. If
any code inside `ListSetDups` called that original implementation,
it still would in `ListSetDupsExtended`. Why? Remember
the semantics of modules: all definitions are evaluated from
top to bottom, in order. So the new definition of `elts` above
won't come into use until the very end of evaluation. This differs
from what you might expect from object-oriented languages like Java,
which use a language feature called [dynamic dispatch][dd] to figure
out which implementation to invoke.
[dd]: https://en.wikipedia.org/wiki/Dynamic_dispatch
## Semantics of include
Includes can be used inside of structures and inside of signatures.
Of course, when we include inside a signature, we must be including another
signature. And when we include inside a structure, we must be including
another structure.
**Including a structure.**
Including a structure is like writing a local definition for each name defined
in the module. Writing `include ListSetDups` as did above,
for example, has an effect similar to writing exactly the following:
```
module ListSetDupsExtended = struct
(* BEGIN all the includes *)
type 'a t = 'a ListSetDups.t
let empty = ListSetDups.empty
let mem = ListSetDups.mem
let add = ListSetDups.add
let elts = ListSetDups.elts
(* END all the includes *)
let of_list lst = List.fold_right add lst empty
end
```
But if the set of names defined inside `ListSetDups` ever changed,
the `include` would reflect that change, whereas the static code
we wrote above would not.
**Encapsulation and includes.**
We mentioned above that you might wonder why we didn't write this simpler definition
of `of_list`:
```
let of_list lst = lst
```
The reason is that includes must obey encapsulation, just like the rest
of the module system. `ListSetDups` was sealed with the module type
`Set`, thus making `'a t` abstract. So even `ListSetDupsExtended` is
forbidden from knowing that `'a t` and `'a list` are synonyms.
A standard way to solve this problem is to rewrite the definitions as
folllows:
```
module ListSetDupsImpl = struct
type 'a t = 'a list
let empty = []
let mem = List.mem
let add x s = x::s
let elts s = List.sort_uniq Pervasives.compare s
end
module ListSetDups : Set = ListSetDupsImpl
module ListSetDupsExtended = struct
include ListSetDupsImpl
let of_list lst = lst
end
```
The important change is that `ListSetDupsImpl` is not sealed, so its type `'a t`
is not abstract. When we include it in `ListSetDupsExtended`, we can therefore
exploit the fact that it's a synonym for `'a list`.
What we just did is effectively the same as what Java does to handle the
visibility modifiers `public`, `private`, etc. The "private version" of
a class is like the `Impl` version above: anyone who can see that
version gets to see all the exposed "things" (fields in Java, types in
OCaml), without any encapsulation. The "public version" of a class is
like the sealed version above: anyone who can see that version is
forced to treat the "things" (fields in Java, types in OCaml) as abstract,
hence encapsulated.
**Including a signature.**
Signatures also support includes. For example, we could write:
```
module type SetExtended = sig
include Set
val of_list : 'a list -> 'a t
end
```
Which would have an effect similar to writing the following:
```
module type SetExtended = sig
(* BEGIN all the includes *)
type 'a t
val empty : 'a t
val mem : 'a -> 'a t -> bool
val add : 'a -> 'a t -> 'a t
val elts : 'a t -> 'a list
(* END all the includes *)
val of_list : 'a list -> 'a t
end
```
And that module type would be suitable for `ListSetDupsExtended`:
```
module ListSetDupsExtended : SetExtended = struct
include ListSetDupsImpl
let of_list lst = lst
end
```
By sealing the module, we've again made `'a t` abstract, so no one outside
that module gets to know that its representation type is actually `'a list`.
## Include vs. open
The `include` and `open` statements are quite similar, but they have
a subtly different effect on a structure. Consider this code:
```
module M = struct
let x = 0
end
module N = struct
include M
let y = x + 1
end
module O = struct
open M
let y = x + 1
end
```
If we enter that in the toplevel, we get the following response:
```
module M : sig val x : int end
module N : sig val x : int val y : int end
module O : sig val y : int end
```
Look closely at the values contained in each structure. `N` has both
an `x` and `y`, whereas `O` has only a `y`. The reason is that
`include M` causes all the definitions of `M` to also be included in
`N`, so the definition of `x` from `M` is present in `N`. But `open M`
only made those definitions available in the *scope* of `O`; it doesn't
actually make them part of the *structure*. So `O` does not contain
a definition of `x`, even though `x` is in scope during the evaluation
of `O`'s definition of `y`.
A metaphor for understanding this difference might be: `open M`
imports definitions from `M` and makes them available for local
consumption, but they aren't exported to the outside world.
Whereas `include M` imports definitions from `M`, makes them
available for local consumption, and additionally exports
them to the outside world.
## An example where include doesn't suffice
Suppose we wanted to write a function that could add a bunch of elements
to a set, something like:
```
(* [add_all l s] is the set [s] unioned with all the elements of [l] *)
let rec add_all lst set = match lst with
| [] -> set
| h::t -> add_all t (add h set)
```
(Of course, we could code that up more tersely with a fold function.)
One possibility would be to copy that code into both structures. That would
compile, but it's poor software engineering. If ever an improvement
needs to be made to that code (e.g., replacing it with a fold function),
we have to remember to do it in two places. So let's rule that out right
away as a non-solution.
So instead, after defining both set implementations above, suppose we try to enter
that code into utop outside of either implementation. We'll get an error:
```
# let rec add_all lst set = match lst with
| [] -> set
| h::t -> add_all t (add h set)
Error: Unbound value add
```
The problem is we either need to choose `ListSetDups.add` or `ListSetNoDups.add`.
If we pick the former, the code will compile, but it will be useful only
with that one implementation:
```
# let rec add_all lst set = match lst with
| [] -> set
| h::t -> add_all t (ListSetNoDups.add h set)
- : 'a list -> 'a ListSetNoDups.t -> 'a ListSetNoDups.t =
```
We could make the code parametric with respect to the `add` function:
```
let rec add_all' add lst set = match lst with
| [] -> set
| h::t -> add_all' add t (add h set)
let add_all_dups lst set = add_all' ListSetDups.add lst set
let add_all_nodups lst set = add_all' ListSetNoDups.add lst set
```
But this is annoying in a couple ways. First, we have to remember which
function name to call, whereas all the other operations that are part of
those modules have the same name, regardless of which module they're in.
Second, the `add_all` functions live outside either module, so clients
who open one of the modules won't automatically get the ability to name
those functions.
Let's try to use includes to solve this problem. First,
we write a module that contains the parameterized implementation
of `add_all'`:
```
module AddAll = struct
let rec add_all' add lst set = match lst with
| [] -> set
| h::t -> add_all' add t (add h set)
end
module ListSetNoDupsExtended : SetExtended = struct
include ListSetNoDups
include AddAll
let add_all lst set = add_all' add lst set
end
module ListSetDupsExtended : SetExtended = struct
include ListSetDups
include AddAll
let add_all lst set = add_all' add lst set
end
```
We've succeeded, partially, in achieving code reuse. The code that
implements `add_all'` has been factored out into a single location and
reused in the two structures. So we could now replace it with an
improved (?) version using a fold function:
```
module AddAll = struct
let add_all' add lst set =
let add' s x = add x s in
List.fold_left add' set lst
end
```
But we've partially failed. We still have to write an implementation
of `add_all` in both modules, and worse yet, those implementations
are identical. So there's still code duplication occurring.
Could we do better? Yes. And that leads us to functors...
## Functors
The problem we were having in the previous section was that we wanted
to add code to two different modules, but that code needed to
be parameterized on the details of the module to which it was being
added. It's that kind of parameterization that is enabled by
an OCaml language feature called *functors*.
The name is perhaps a bit itimidating, but **a functor is simply a
"function" from structures to structures.** The word "function" is in
quotation marks in that sentence only because it's a kind of function
that's not interchangeable with the rest of the functions we've already
seen. OCaml is *stratified*: structures are distinct from values, so
functions from structures to structures cannot be written or used in the
same way as functions from values to values. But conceptually, functors
really are just functions.
* * *
*
Why "functor"? In [category theory][intellectualterrorism], a *category*
contains *morphisms*, which are a generalization of functions as we
known them, and a *functor* is map between categories. Likewise, OCaml
structures contain functions, and OCaml functors map from structures
to structures. For more information about category theory,
bug Prof. Tate to teach CS 6117 again.
*
[intellectualterrorism]: https://en.wikipedia.org/wiki/Category_theory
* * *
First, let's write a simple signature; there's nothing new here:
```
module type X = sig
val x : int
end
```
Now, using that signature, here's a tiny example of a functor:
```
module IncX (M: X) = struct
let x = M.x + 1
end
```
The functor's name is `IncX`. It's a function from structures to structures.
As a function, it takes an input and produces an output. Its input
is named `M`, and the type of its input is `X`. Its output
is the structure that appears on the right-hand side of the equals sign:
`struct let x = M.x + 1`.
Another way to think about `IncX` is that it's a *parameterized structure*.
The parameter that it takes is named `M` and has type `X`. The structure itself
has a single value named `x` in it. The value that `x` has will depend
on the parameter `M`.
Since functors are functions, we *apply* them. Here's an example of applying
`IncX`:
```
# module A = struct let x = 0 end
# A.x
- : int = 0
# module B = IncX(A)
# B.x
- : int = 1
# module C = IncX(B)
# C.x
- : int = 2
```
Each time, we pass `IncX` a structure. When we pass it the structure bound
to the name `A`, the input to `IncX` is `struct let x = 0 end`. `IncX`
takes that input and produces an output `struct let x = A.x + 1 end`.
Since `A.x` is `0`, the result is `struct let x = 1 end`. So `B`
is bound to `struct let x = 1 end`. Similarly, `C` ends up being
bound to `struct let x = 2 end`.
Although the functor `IncX` returns a structure that is quite similar to
its input structure, that need not be the case. In fact, a functor can
return any structure it likes, perhaps something very different than its
input structure:
```
module MakeY (M:X) = struct
let y = 42
end
```
The structure returned by `MakeY` has a value named `y` but does not
have any value named `x`. In fact, `MakeY` completely ignores its
input structure.
## Functor syntax
In the functor syntax we've been using:
```
module F (M : S) = struct
...
end
```
the type annotation `: S` and the parentheses around it, `(M : S)` are required.
The reason why is that type inference of the signature of a functor input
is not supported.
Much like functions, functors can be written anonymously. The following
two syntaxes for functors are equivalent:
```
module F (M : S) = struct
...
end
module F = functor (M : S) -> struct
...
end
```
The second form uses the `functor` keyword to create an anonymous functor,
like how the `fun` keyword creates an anonymous function.
And functors can be parameterized on multiple structures:
```
module F (M1 : S1) ... (Mn : Sn) = struct
...
end
```
Of course, that's just syntactic sugar for a *higher-order functor* that takes
a structure as input and returns an anonymous functor:
```
module F = functor (M1 : S1) -> ... -> functor (Mn : Sn) -> struct
...
end
```
If you want to specify the output type of a functor, the syntax is again
similar to functions:
```
module F (M : Si) : So = struct
...
end
```
It's also possible to write the type annotation on the structure:
```
module F (M : Si) = (struct
...
end : So)
```
In that case, note that the parentheses around the anonymous structure are
required. It turns out that syntax parallels a similar syntax for
functions that we just haven't used before:
```
let f x = (x+1 : int)
```
The syntax for writing down the type of a functor is also much like the
syntax for writing down the type of a function. Here is the type of
a functor that takes a structure matching signature `Si` as input
and returns a structure matching `So`:
```
functor (M : Si) -> So
```
If you wanted to annotate a functor definition with a type you can
combine a couple of the syntaxes we've now seen:
```
module F : functor (M : Si) -> So =
functor (M : Si) -> struct ... end
```
The first occurrence of `functor` in that code means that what follows
is a functor type, and the second occurrence means that what follows
is an anonymous functor value.
## Using a functor to eliminate code duplication
Since functors are really just parameterized modules, we can use them to
produce functions that are parameterized on any structure that matches
a signature. Here's an example of doing that.
Recall our data structures for stacks:
```
module type StackSig = sig
type 'a t
val empty : 'a t
val push : 'a -> 'a t -> 'a t
val peek : 'a t -> 'a
end
module ListStack = struct
type 'a t = 'a list
let empty = []
let push x s = x::s
let peek = function [] -> failwith "empty" | x::_ -> x
end
(* called MyStack because the standard library already has a Stack *)
module MyStack = struct
type 'a t = Empty | Entry of 'a * 'a t
let empty = Empty
let push x s = Entry (x, s)
let peek = function Empty -> failwith "empty" | Entry(x,_) -> x
end
```
Suppose we wanted to write code that would test a `ListStack`:
```
assert (ListStack.(empty |> push 1 |> peek) = 1)
```
Unfortunately, to test a `MyStack`, we'd have to duplicate that code:
```
assert (MyStack.(empty |> push 1 |> peek) = 1)
```
And if we had other stack implementations, we'd have to duplicate
the test for them, too. That's not so horrible to contemplate if
it's just one test case for a couple implementations, but if it's
hundreds of tests for even a couple implementations, that's just
too much duplication to be good software engineering.
Functors offer a better solution. We can write a functor that
is parameterized on the stack implementation, and produces the
test for that implementation:
```
module StackTester (S:StackSig) = struct
assert (S.(empty |> push 1 |> peek) = 1)
end
module MyStackTester = StackTester(MyStack)
module ListStackTester = StackTester(ListStack)
```
Now we can factor out all our tests into the functor `StackTester`, and
when we apply that functor to a stack implementation, we get a set of
tests for that implementation. Of course, this would work with OUnit
as well as assertions.
## Back to the example where include didn't suffice
Earlier, we tried to add a function `add_all` to both `ListSetNoDups`
and `ListSetDups` without having any duplicated code, but we didn't
totally succeed. Now let's really do it right.
The problem we had earlier was that we needed to parameterize the
implementation of `add_all` on the `add` function in the set
data structure. We can accomplish that parameterization with
a functor.
Here is a functor that takes in a structure named `S` that matches the `Set`
signature, then produces a new structure having a single function named
`add_all` in it:
```
module AddAll(S:Set) = struct
let add_all lst set =
let add' s x = S.add x s in
List.fold_left add' set lst
end
```
Notice how the functor, in its body, uses `S.add`. It takes the implementation
of `add` from `S` and uses it to implement `add_all`, thus solving the
exact problem we had before when we tried to use includes.
When we apply `AddAll` to our set implementations, we get structures
containing an `add_all` function for each implementation:
```
# module AddAllListSetDups = AddAll(ListSetDups);;
module AddAllListSetDups :
sig
val add_all : 'a list -> 'a ListSetDups.t -> 'a ListSetDups.t
end
# module AddAllListSetNoDups = AddAll(ListSetNoDups);;
module AddAllListSetNoDups :
sig
val add_all : 'a list -> 'a ListSetNoDups.t -> 'a ListSetNoDups.t
end
```
So the functor has enabled the code reuse we couldn't get before:
we now can implement a single `add_all` function and from it derive
implementations for two different set structures.
But that's the **only** function those two structures contain. Really
what we want is a full set implementation that also contains the
`add_all` function. We can get that by combining includes with functors:
```
module ExtendSet(S:Set) = struct
include S
let add_all lst set =
let add' s x = S.add x s in
List.fold_left add' set lst
end
```
That functor takes a set structure as input, and produces a structure that contains
everything from that set structure (because of the `include`) as well as
a new function `add_all` that is implemented using the `add` function from the
set.
When we apply the functor, we get a very nice set data structure as a result:
```
# module ListSetNoDupsExtended = ExtendSet(ListSetNoDups);;
module ListSetNoDupsExtended :
sig
type 'a t = 'a ListSetNoDups.t
val empty : 'a t
val mem : 'a -> 'a t -> bool
val add : 'a -> 'a t -> 'a t
val elts : 'a t -> 'a list
val add_all : 'a list -> 'a t -> 'a t
end
```
Notice how the output structure records the fact that its type `t` is the
same type as the type `t` in its input structure. They share it because
of the `include`.
Stepping back, what we just did bears more than a passing resemblance
to what you're used to doing in CS 2110 with class extension in Java. We
created a base module and extended its functionality with new code
while preserving its old functionality. But whereas class extension
necessitates that the newly extended class is a subtype of the old,
and that it still has all the old functionality, OCaml functors
are more fine-grained in what they can accomplish. We can choose
whether they include the old functionality. And no subtyping relationships
are necessarily involved. Moreover, the functor we wrote can be used
to extend **any** set implementation with `add_all`, whereas class
extension applies to just a **single** base class. There are ways
of achieving something similar in Java with *mixins*, which weren't
supported before Java 1.5.
## Standard library Map
The standard library's Map module implements a dictionary data
structure using balanced binary trees. You can see the [implementation
of that module on GitHub][mapimplsrc] as well as its [interface][mapintsrc].
[mapintsrc]: https://github.com/ocaml/ocaml/blob/trunk/stdlib/map.mli
[mapimplsrc]: https://github.com/ocaml/ocaml/blob/trunk/stdlib/map.ml
The Map module defines a functor `Make` that creates a structure implementing
a map over a particular type of keys. That type is the input structure
to `Make`. The type of that input structure is `Map.OrderedType`, which are types that
support a `compare` operation:
```
module type OrderedType = sig
type t
val compare : t -> t -> int
end
```
The Map module needs ordering because balanced binary trees need to be able
to compare keys to determine whether one is greater than another.
According to the library's documentation, `compare` must satisfy this
specification:
```
(* This is a two-argument function [f] such that
* [f e1 e2] is zero if the keys [e1] and [e2] are equal,
* [f e1 e2] is strictly negative if [e1] is smaller than [e2],
* and [f e1 e2] is strictly positive if [e1] is greater than [e2].
* Example: a suitable ordering function is the generic structural
* comparison function [Pervasives.compare]. *)
val compare : t -> t -> int
```
Arguably this specification is a missed opportunity for good design: the library
designers could instead have defined a variant:
```
type order = LT | EQ | GT
```
and required the output type of `compare` to be `order`. But historically many
languages have used comparison functions with similar specifications, such
as the C standard library's [`strcmp` function][strcmp].
[strcmp]: http://www.gnu.org/software/libc/manual/html_node/String_002fArray-Comparison.html
The output of `Map.Make` is a structure whose type is (almost) `Map.S` and supports
all the usual operations we would expect from a dictionary:
```
module type S =
sig
type key
type 'a t
val empty: 'a t
val mem: key -> 'a t -> bool
val add: key -> 'a -> 'a t -> 'a t
val find: key -> 'a t -> 'a
...
end
```
There are two reasons why we say that the output is "almost" that type:
1. The Map module actually
specifies a *sharing constraint* (which we covered in the previous notes):
`type key = Ord.t`. That is, the output of `Map.Make` shares its `key` type
with the type `Ord.t`. That enables keys to be compared with `Ord.compare`.
The way that sharing constraint is specified is in the type of `Make` (which
can be found in `map.mli`, the interface file for the map compilation unit):
```
module Make : functor (Ord : OrderedType) -> (S with type key = Ord.t)
```
2. The Map module actually specifies something called a *variance* on the
representation type, writing `+'a t` instead of `'a t` as we did above.
We won't concern ourselves with what this means; it's [related to subtyping
and polymorphic variants][variance].
[variance]: https://blogs.janestreet.com/a-and-a/
The functor `Map.Make` itself (which can be found in `map.ml`, the implementation
file for the map compilation unit) is currently defined as follows, though of course
the library is free to change its internals in the future:
```
module Make(Ord: OrderedType) = struct
type key = Ord.t
type 'a t =
| Empty
| Node of 'a t * key * 'a * 'a t * int
(* left subtree * key * value * right subtree * height of node *)
let empty = Empty
let rec mem x = function
| Empty -> false
| Node(l, v, _, r, _) ->
let c = Ord.compare x v in
c = 0 || mem x (if c < 0 then l else r)
...
```
The `key` type is defined to be a synonym for the type `t` inside `Ord`, so
`key` values are comparable using `Ord.compare`. The `mem` function uses
that to compare keys and decide whether to recurse on the left subtree or right
subtree.
## Using the Map module
**A map for integer keys.**
To create a map, we have to pass a structure into `Map.Make`, and that structure
has to define a type `t` and `compare` function. The simplest way to do
that is to pass an anonymous structure into the functor:
```
# module IntMap = Map.Make(struct type t = int let compare = Pervasives.compare end);;
module IntMap :
sig
type key = int
type 'a t
val empty : 'a t
...
end
# open IntMap;;
# let m1 = add 1 "one" empty;;
val m1 : string t =
# find 1 m1;;
- : string = "one"
# mem 42 m1;;
- : bool = false
# find 42 m1;;
Exception: Not_found.
# bindings m1;;
- : (int * string) list = [(1, "one")]
# let m2 = add 1 1. empty;;
val m2 : float t =
# bindings m2;;
- : (int * float) list = [(1, 1.)]
```
Here are some things to note about the utop transcript above:
* We can write a structure on one line, even though until now we've always
used line breaks to keep them readable. When writing a structure on
on line (which we'll only do for really short structures) it can
be useful to use the double semicolon between definitions to enhance readability:
```
# module IntMap = Map.Make(struct type t = int;; let compare = Pervasives.compare end);;
```
This is an exception to the general style rule of avoiding double semicolon inside
source code.
If we didn't want to pass an anonymous structure, we could instead define a
module and pass it:
```
module Int = struct
type t = int
let compare = Pervasives.compare
end
module IntMap = Map.Make(Int)
```
* The signature of the structure returned by `Map.Make` records the fact that
keys are of type `int`. The type `'a t` is the name of the representation
type of an `IntMap`. The `'a` type variable in it is the type of values in
the map. Although in general the map could have any value type, once we
add a single value to a map, that "pins down" the value type of that
particular map. When we add the binding from key `1` to string `"one` above,
notice that the map value returned is of type `string t`.
* The `bindings` function of a map returns an association list of all the bindings
in the map. Association lists are, of course, another data structure
that implements a dictionary. But they are less efficient than the balanced
binary search tree implementation used by `Map`.
* The `mem` function tests whether a key is a __mem__ber of a map.
The `find` function finds the value associated with a key, and raises
the `Not_found` exception if the key is not bound in the map. That's
the same exception that `List.assoc` raises if a key is not bound
in an association list.
**A map for string keys.**
If a module already provides a type `t` that can be compared, we can
immediately use that module as an argument to `Map.Make`. Several
standard library modules are designed to be used in that way.
For example, the `String` module defines a type `t` and a `compare` function
that meet the specification of `Map.OrderedType`. So we can easily
create maps whose key type is `string`:
```
# module StringMap = Map.Make(String);;
module StringMap :
sig
type key = string
...
end
```
Now we could use the string map like we used the int map. This time, for sake
of example, let's not open the `StringMap` module:
```
# let m = StringMap.(add "one" 1 empty);;
# let m' = StringMap.(add "two" 2 m);;
# StringMap.bindings m';;
- : (string * int) list = [("one", 1); ("two", 2)]
# StringMap.bindings m;;
- : (string * int) list = [("one", 1)]
#
```
Note that maps are a functional data structure: adding a mapping to `m`
did not mutate `m`; rather, it produced a new map that we bound
to `m'`, and both the new map and old map remain available for use.
**A map for record keys.**
When the type of a key becomes more complicated than a built-in primitive
type, we might want to write our own custom comparison function. For
example, suppose we want a map in which keys are records representing names,
and in which names are sorted alphabetically by last name then by first name.
In the code below, we provide a module `Name` that can compare records that way:
```
type name = {first:string; last:string}
module Name = struct
type t = name
let compare {first=first1;last=last1}
{first=first2;last=last2} =
match Pervasives.compare last1 last2 with
| 0 -> Pervasives.compare first1 first2
| c -> c
end
```
The `Name` module can be used as input to `Map.Make` because it matches
the `Map.OrderedType` signature:
```
module NameMap = Map.Make(Name)
```
And now we could add some names to a map. Below, for sake of example,
we map some names to birth years, and we use the pipeline operator
to easily add multiple bindings one after another:
```
let k1 = {last="Kardashian"; first="Kourtney"}
let k2 = {last="Kardashian"; first="Kimberly"}
let k3 = {last="Kardashian"; first="Khloe"}
let k4 = {last="West"; first="Kanye"}
let nm = NameMap.(
empty |> add k1 1979 |> add k2 1980
|> add k3 1984 |> add k4 1977)
let lst = NameMap.bindings nm
```
The value of `lst` will be
```
[({first = "Khloe"; last = "Kardashian"}, 1984);
({first = "Kimberly"; last = "Kardashian"}, 1980);
({first = "Kourtney"; last = "Kardashian"}, 1979);
({first = "Kanye"; last = "West"}, 1977)]
```
Note how the order of keys in that list is not the same as the order in which
we added them. The list is sorted according to the `Name.compare` function we wrote.
Several of the other functions in the `Map.S` signature will also process map bindings
in that sorted order—for example, `map`, `fold`, and `iter`.
## Code reuse with Map
Stepping back from the mechanics of how to use `Map`, let's think about
how it achieves code reuse. The implementor of `Map` had a tricky problem
to solve: balanced binary search trees require a way to compare keys, but
the implementor can't know in advance all the different types of keys
that a client of the data structure will want to use. And each type of
key might need its own comparison function. Although the standard library's
`Pervasives.compare` *can* be used to compare any two values of the same
type, the result it returns isn't necessarily what a client will want. For
example, it's not guaranteed to sort names in the way we wanted above.
So the implementor of `Map` parameterized it on a structure that bundles
together the type of keys with a function that can be used to compare them.
It's the client's responsibility to implement that structure. Given it,
all the code in `Map` can be re-used by the client.
The Java Collections Framework solves a similar problem in the TreeMap class,
which has a [constructor that takes a Comparator][treemapcomparator]. There, the client has the
responsibility of implementing a class for comparisons, rather than a structure.
Though the language features are different, the idea is the same.
[treemapcomparator]: https://docs.oracle.com/javase/8/docs/api/java/util/TreeMap.html#TreeMap-java.util.Comparator-
## Summary
Functors are an advanced language feature in OCaml that might seem mysterious
at first. If so, keep in mind: they're really just a kind of function
that takes a structure as input and returns a structure as output. The
reason they don't behave quite like normal OCaml functions is that structures
are not first-class values in OCaml: you can't write regular functions
that take a structure as input or return a structure as output. But functors
can do just that.
Functors and includes enable code reuse. The kinds of code reuse
you learned to achieve in CS 2110 with object-oriented features can
also be achieved with functors and include. That's not to say that
functors and includes are exactly equivalent to those object-oriented features:
some kinds of code reuse might be easier to achieve with one set of features
than the other.
One way to think about this might be that class extension is a very
limited (but very useful) combination of functors and includes: extending
a class is like writing a functor that takes the base class as input,
includes it, then adds new functions. But functors provide more general
capability than class extension, because they can compute arbitrary functions
of their input structure, rather than being limited to just certain
kinds of extension.
Perhaps the most important idea to get out of studying the OCaml module
system is an appreciation for the aspects of modularity that transcend
any given language: namespaces, abstraction, and code reuse. Having
seen those ideas in a couple very different languages, you're equipped
to recognize them more clearly in the next language you learn.
## Terms and concepts
* code reuse
* functor
* include
* maintainability
* maps
* modularity
* open
* parameterized structure
* set representations
* signatures
* structures
## Further reading
* *Introduction to Objective Caml*, chapter 13
* *Real World OCaml*, chapter 9