Functors

# Functors * * * Topics: * sets represented with lists * code reuse * includes * functors * the standard library `Map` module * * * In the previous lecture we began studying the OCaml module system, and we saw how it supports namespaces and abstraction. Today we'll look at how it supports code reuse, so that code need not be copied or rewritten. ## Sets As a running example, let's consider a data structure for representing sets: ``` module type Set = sig type 'a t (* [empty] is the empty set *) val empty : 'a t (* [mem x s] holds iff [x] is an element of [s] *) val mem : 'a -> 'a t -> bool (* [add x s] is the set [s] unioned with the set containing exactly [x] *) val add : 'a -> 'a t -> 'a t (* [elts s] is a list containing the elements of [s]. No guarantee * is made about the ordering of that list. *) val elts : 'a t -> 'a list end ``` There are many other operations a set data structure might be expected to support, but these will suffice for now. Here's an implementation of that interface: ``` module ListSetNoDups : Set = struct type 'a t = 'a list let empty = [] let mem = List.mem let add x s = if mem x s then s else x::s let elts s = s end ``` Note how `add` ensures that the representation never contains any duplicates, so the implementation of `elts` is quite easy. Of course, that makes the implementation of `add` linear time, which is not ideal. But if we want high-performance sets, lists are not the right representation type anyway; there are much better data structures for sets, and you might see some later in this course or in an algorithms course. Here's a second implementation: ``` module ListSetDups : Set = struct type 'a t = 'a list let empty = [] let mem = List.mem let add x s = x::s let elts s = List.sort_uniq Pervasives.compare s end ``` In that implementation, the `add` operation is now constant time, and the `elts` operation is logarithmic time. For some (most?) workloads, this is a better choice than `ListSetNoDups`, especially if `add` is likely to be called more often than `elts`, and if sets aren't expected to contain many duplicates. ## Includes Suppose we wanted to add a function `of_list : 'a list -> 'a t` to the `ListSetDups` module that could construct a set out of a list. If we had access to the source code of both `ListSetDups` and `Set`, and if we were permitted to modify it, this wouldn't be hard. But what if they were third-party libraries for which we didn't have source code? In CS 2110, you will have learned about extending classes and inheriting methods of a superclass. Those object-oriented language features provide (among many other things) the ability to reuse code. For example, a subclass includes all the methods of its superclasses, though some might by overridden. OCaml provides a language features called *includes* that also enables code reuse. This feature is similar to the object-oriented example we just gave: it enables a structure to include all the values defined by another structure, or a signature to include all the names declared by another signature. We can use includes to solve the problem of adding `of_list` to `ListSetDups`: ``` module ListSetDupsExtended = struct include ListSetDups let of_list lst = List.fold_right add lst empty end ``` This code says that `ListSetDupsExtended` is a structure containing all the definitions of the `ListSetDups` structure, as well as a definition of `of_list`. We don't have to know the source code implementing `ListSetDups` to make this happen. (You might wonder we why can't simply write `let of_list lst = lst`. See the section on the semantics of includes, below, for the answer.) If we want to provide a new implementation of one of the included functions we could do that too: ``` module ListSetDupsExtended = struct include ListSetDups let of_list lst = List.fold_right add lst empty let rec elts = function | [] -> [] | h::t -> if mem h t then elts' t else h::(elts' t) end ``` But that's actually a less efficient implementation of `elts`, so we probably shouldn't do that for real. One misconception to watch out for is that the above example does not *replace* the original implementation of `elts`. If any code inside `ListSetDups` called that original implementation, it still would in `ListSetDupsExtended`. Why? Remember the semantics of modules: all definitions are evaluated from top to bottom, in order. So the new definition of `elts` above won't come into use until the very end of evaluation. This differs from what you might expect from object-oriented languages like Java, which use a language feature called [dynamic dispatch][dd] to figure out which implementation to invoke. [dd]: https://en.wikipedia.org/wiki/Dynamic_dispatch ## Semantics of include Includes can be used inside of structures and inside of signatures. Of course, when we include inside a signature, we must be including another signature. And when we include inside a structure, we must be including another structure. **Including a structure.** Including a structure is like writing a local definition for each name defined in the module. Writing `include ListSetDups` as did above, for example, has an effect similar to writing exactly the following: ``` module ListSetDupsExtended = struct (* BEGIN all the includes *) type 'a t = 'a ListSetDups.t let empty = ListSetDups.empty let mem = ListSetDups.mem let add = ListSetDups.add let elts = ListSetDups.elts (* END all the includes *) let of_list lst = List.fold_right add lst empty end ``` But if the set of names defined inside `ListSetDups` ever changed, the `include` would reflect that change, whereas the static code we wrote above would not. **Encapsulation and includes.** We mentioned above that you might wonder why we didn't write this simpler definition of `of_list`: ``` let of_list lst = lst ``` The reason is that includes must obey encapsulation, just like the rest of the module system. `ListSetDups` was sealed with the module type `Set`, thus making `'a t` abstract. So even `ListSetDupsExtended` is forbidden from knowing that `'a t` and `'a list` are synonyms. A standard way to solve this problem is to rewrite the definitions as folllows: ``` module ListSetDupsImpl = struct type 'a t = 'a list let empty = [] let mem = List.mem let add x s = x::s let elts s = List.sort_uniq Pervasives.compare s end module ListSetDups : Set = ListSetDupsImpl module ListSetDupsExtended = struct include ListSetDupsImpl let of_list lst = lst end ``` The important change is that `ListSetDupsImpl` is not sealed, so its type `'a t` is not abstract. When we include it in `ListSetDupsExtended`, we can therefore exploit the fact that it's a synonym for `'a list`. What we just did is effectively the same as what Java does to handle the visibility modifiers `public`, `private`, etc. The "private version" of a class is like the `Impl` version above: anyone who can see that version gets to see all the exposed "things" (fields in Java, types in OCaml), without any encapsulation. The "public version" of a class is like the sealed version above: anyone who can see that version is forced to treat the "things" (fields in Java, types in OCaml) as abstract, hence encapsulated. **Including a signature.** Signatures also support includes. For example, we could write: ``` module type SetExtended = sig include Set val of_list : 'a list -> 'a t end ``` Which would have an effect similar to writing the following: ``` module type SetExtended = sig (* BEGIN all the includes *) type 'a t val empty : 'a t val mem : 'a -> 'a t -> bool val add : 'a -> 'a t -> 'a t val elts : 'a t -> 'a list (* END all the includes *) val of_list : 'a list -> 'a t end ``` And that module type would be suitable for `ListSetDupsExtended`: ``` module ListSetDupsExtended : SetExtended = struct include ListSetDupsImpl let of_list lst = lst end ``` By sealing the module, we've again made `'a t` abstract, so no one outside that module gets to know that its representation type is actually `'a list`. ## Include vs. open The `include` and `open` statements are quite similar, but they have a subtly different effect on a structure. Consider this code: ``` module M = struct let x = 0 end module N = struct include M let y = x + 1 end module O = struct open M let y = x + 1 end ``` If we enter that in the toplevel, we get the following response: ``` module M : sig val x : int end module N : sig val x : int val y : int end module O : sig val y : int end ``` Look closely at the values contained in each structure. `N` has both an `x` and `y`, whereas `O` has only a `y`. The reason is that `include M` causes all the definitions of `M` to also be included in `N`, so the definition of `x` from `M` is present in `N`. But `open M` only made those definitions available in the *scope* of `O`; it doesn't actually make them part of the *structure*. So `O` does not contain a definition of `x`, even though `x` is in scope during the evaluation of `O`'s definition of `y`. A metaphor for understanding this difference might be: `open M` imports definitions from `M` and makes them available for local consumption, but they aren't exported to the outside world. Whereas `include M` imports definitions from `M`, makes them available for local consumption, and additionally exports them to the outside world. ## An example where include doesn't suffice Suppose we wanted to write a function that could add a bunch of elements to a set, something like: ``` (* [add_all l s] is the set [s] unioned with all the elements of [l] *) let rec add_all lst set = match lst with | [] -> set | h::t -> add_all t (add h set) ``` (Of course, we could code that up more tersely with a fold function.) One possibility would be to copy that code into both structures. That would compile, but it's poor software engineering. If ever an improvement needs to be made to that code (e.g., replacing it with a fold function), we have to remember to do it in two places. So let's rule that out right away as a non-solution. So instead, after defining both set implementations above, suppose we try to enter that code into utop outside of either implementation. We'll get an error: ``` # let rec add_all lst set = match lst with | [] -> set | h::t -> add_all t (add h set) Error: Unbound value add ``` The problem is we either need to choose `ListSetDups.add` or `ListSetNoDups.add`. If we pick the former, the code will compile, but it will be useful only with that one implementation: ``` # let rec add_all lst set = match lst with | [] -> set | h::t -> add_all t (ListSetNoDups.add h set) - : 'a list -> 'a ListSetNoDups.t -> 'a ListSetNoDups.t = <fun> ``` We could make the code parametric with respect to the `add` function: ``` let rec add_all' add lst set = match lst with | [] -> set | h::t -> add_all' add t (add h set) let add_all_dups lst set = add_all' ListSetDups.add lst set let add_all_nodups lst set = add_all' ListSetNoDups.add lst set ``` But this is annoying in a couple ways. First, we have to remember which function name to call, whereas all the other operations that are part of those modules have the same name, regardless of which module they're in. Second, the `add_all` functions live outside either module, so clients who open one of the modules won't automatically get the ability to name those functions. Let's try to use includes to solve this problem. First, we write a module that contains the parameterized implementation of `add_all'`: ``` module AddAll = struct let rec add_all' add lst set = match lst with | [] -> set | h::t -> add_all' add t (add h set) end module ListSetNoDupsExtended : SetExtended = struct include ListSetNoDups include AddAll let add_all lst set = add_all' add lst set end module ListSetDupsExtended : SetExtended = struct include ListSetDups include AddAll let add_all lst set = add_all' add lst set end ``` We've succeeded, partially, in achieving code reuse. The code that implements `add_all'` has been factored out into a single location and reused in the two structures. So we could now replace it with an improved (?) version using a fold function: ``` module AddAll = struct let add_all' add lst set = let add' s x = add x s in List.fold_left add' set lst end ``` But we've partially failed. We still have to write an implementation of `add_all` in both modules, and worse yet, those implementations are identical. So there's still code duplication occurring. Could we do better? Yes. And that leads us to functors... ## Functors The problem we were having in the previous section was that we wanted to add code to two different modules, but that code needed to be parameterized on the details of the module to which it was being added. It's that kind of parameterization that is enabled by an OCaml language feature called *functors*. The name is perhaps a bit itimidating, but **a functor is simply a "function" from structures to structures.** The word "function" is in quotation marks in that sentence only because it's a kind of function that's not interchangeable with the rest of the functions we've already seen. OCaml is *stratified*: structures are distinct from values, so functions from structures to structures cannot be written or used in the same way as functions from values to values. But conceptually, functors really are just functions. * * * Why "functor"? In [category theory][intellectualterrorism], a *category* contains *morphisms*, which are a generalization of functions as we known them, and a *functor* is map between categories. Likewise, OCaml structures contain functions, and OCaml functors map from structures to structures. For more information about category theory, bug Prof. Tate to teach CS 6117 again. [intellectualterrorism]: https://en.wikipedia.org/wiki/Category_theory * * * First, let's write a simple signature; there's nothing new here: ``` module type X = sig val x : int end ``` Now, using that signature, here's a tiny example of a functor: ``` module IncX (M: X) = struct let x = M.x + 1 end ``` The functor's name is `IncX`. It's a function from structures to structures. As a function, it takes an input and produces an output. Its input is named `M`, and the type of its input is `X`. Its output is the structure that appears on the right-hand side of the equals sign: `struct let x = M.x + 1`. Another way to think about `IncX` is that it's a *parameterized structure*. The parameter that it takes is named `M` and has type `X`. The structure itself has a single value named `x` in it. The value that `x` has will depend on the parameter `M`. Since functors are functions, we *apply* them. Here's an example of applying `IncX`: ``` # module A = struct let x = 0 end # A.x - : int = 0 # module B = IncX(A) # B.x - : int = 1 # module C = IncX(B) # C.x - : int = 2 ``` Each time, we pass `IncX` a structure. When we pass it the structure bound to the name `A`, the input to `IncX` is `struct let x = 0 end`. `IncX` takes that input and produces an output `struct let x = A.x + 1 end`. Since `A.x` is `0`, the result is `struct let x = 1 end`. So `B` is bound to `struct let x = 1 end`. Similarly, `C` ends up being bound to `struct let x = 2 end`. Although the functor `IncX` returns a structure that is quite similar to its input structure, that need not be the case. In fact, a functor can return any structure it likes, perhaps something very different than its input structure: ``` module MakeY (M:X) = struct let y = 42 end ``` The structure returned by `MakeY` has a value named `y` but does not have any value named `x`. In fact, `MakeY` completely ignores its input structure. ## Functor syntax In the functor syntax we've been using: ``` module F (M : S) = struct ... end ``` the type annotation `: S` and the parentheses around it, `(M : S)` are required. The reason why is that type inference of the signature of a functor input is not supported. Much like functions, functors can be written anonymously. The following two syntaxes for functors are equivalent: ``` module F (M : S) = struct ... end module F = functor (M : S) -> struct ... end ``` The second form uses the `functor` keyword to create an anonymous functor, like how the `fun` keyword creates an anonymous function. And functors can be parameterized on multiple structures: ``` module F (M1 : S1) ... (Mn : Sn) = struct ... end ``` Of course, that's just syntactic sugar for a *higher-order functor* that takes a structure as input and returns an anonymous functor: ``` module F = functor (M1 : S1) -> ... -> functor (Mn : Sn) -> struct ... end ``` If you want to specify the output type of a functor, the syntax is again similar to functions: ``` module F (M : Si) : So = struct ... end ``` It's also possible to write the type annotation on the structure: ``` module F (M : Si) = (struct ... end : So) ``` In that case, note that the parentheses around the anonymous structure are required. It turns out that syntax parallels a similar syntax for functions that we just haven't used before: ``` let f x = (x+1 : int) ``` The syntax for writing down the type of a functor is also much like the syntax for writing down the type of a function. Here is the type of a functor that takes a structure matching signature `Si` as input and returns a structure matching `So`: ``` functor (M : Si) -> So ``` If you wanted to annotate a functor definition with a type you can combine a couple of the syntaxes we've now seen: ``` module F : functor (M : Si) -> So = functor (M : Si) -> struct ... end ``` The first occurrence of `functor` in that code means that what follows is a functor type, and the second occurrence means that what follows is an anonymous functor value. ## Using a functor to eliminate code duplication Since functors are really just parameterized modules, we can use them to produce functions that are parameterized on any structure that matches a signature. Here's an example of doing that. Recall our data structures for stacks: ``` module type StackSig = sig type 'a t val empty : 'a t val push : 'a -> 'a t -> 'a t val peek : 'a t -> 'a end module ListStack = struct type 'a t = 'a list let empty = [] let push x s = x::s let peek = function [] -> failwith "empty" | x::_ -> x end (* called MyStack because the standard library already has a Stack *) module MyStack = struct type 'a t = Empty | Entry of 'a * 'a t let empty = Empty let push x s = Entry (x, s) let peek = function Empty -> failwith "empty" | Entry(x,_) -> x end ``` Suppose we wanted to write code that would test a `ListStack`: ``` assert (ListStack.(empty |> push 1 |> peek) = 1) ``` Unfortunately, to test a `MyStack`, we'd have to duplicate that code: ``` assert (MyStack.(empty |> push 1 |> peek) = 1) ``` And if we had other stack implementations, we'd have to duplicate the test for them, too. That's not so horrible to contemplate if it's just one test case for a couple implementations, but if it's hundreds of tests for even a couple implementations, that's just too much duplication to be good software engineering. Functors offer a better solution. We can write a functor that is parameterized on the stack implementation, and produces the test for that implementation: ``` module StackTester (S:StackSig) = struct assert (S.(empty |> push 1 |> peek) = 1) end module MyStackTester = StackTester(MyStack) module ListStackTester = StackTester(ListStack) ``` Now we can factor out all our tests into the functor `StackTester`, and when we apply that functor to a stack implementation, we get a set of tests for that implementation. Of course, this would work with OUnit as well as assertions. ## Back to the example where include didn't suffice Earlier, we tried to add a function `add_all` to both `ListSetNoDups` and `ListSetDups` without having any duplicated code, but we didn't totally succeed. Now let's really do it right. The problem we had earlier was that we needed to parameterize the implementation of `add_all` on the `add` function in the set data structure. We can accomplish that parameterization with a functor. Here is a functor that takes in a structure named `S` that matches the `Set` signature, then produces a new structure having a single function named `add_all` in it: ``` module AddAll(S:Set) = struct let add_all lst set = let add' s x = S.add x s in List.fold_left add' set lst end ``` Notice how the functor, in its body, uses `S.add`. It takes the implementation of `add` from `S` and uses it to implement `add_all`, thus solving the exact problem we had before when we tried to use includes. When we apply `AddAll` to our set implementations, we get structures containing an `add_all` function for each implementation: ``` # module AddAllListSetDups = AddAll(ListSetDups);; module AddAllListSetDups : sig val add_all : 'a list -> 'a ListSetDups.t -> 'a ListSetDups.t end # module AddAllListSetNoDups = AddAll(ListSetNoDups);; module AddAllListSetNoDups : sig val add_all : 'a list -> 'a ListSetNoDups.t -> 'a ListSetNoDups.t end ``` So the functor has enabled the code reuse we couldn't get before: we now can implement a single `add_all` function and from it derive implementations for two different set structures. But that's the **only** function those two structures contain. Really what we want is a full set implementation that also contains the `add_all` function. We can get that by combining includes with functors: ``` module ExtendSet(S:Set) = struct include S let add_all lst set = let add' s x = S.add x s in List.fold_left add' set lst end ``` That functor takes a set structure as input, and produces a structure that contains everything from that set structure (because of the `include`) as well as a new function `add_all` that is implemented using the `add` function from the set. When we apply the functor, we get a very nice set data structure as a result: ``` # module ListSetNoDupsExtended = ExtendSet(ListSetNoDups);; module ListSetNoDupsExtended : sig type 'a t = 'a ListSetNoDups.t val empty : 'a t val mem : 'a -> 'a t -> bool val add : 'a -> 'a t -> 'a t val elts : 'a t -> 'a list val add_all : 'a list -> 'a t -> 'a t end ``` Notice how the output structure records the fact that its type `t` is the same type as the type `t` in its input structure. They share it because of the `include`. Stepping back, what we just did bears more than a passing resemblance to what you're used to doing in CS 2110 with class extension in Java. We created a base module and extended its functionality with new code while preserving its old functionality. But whereas class extension necessitates that the newly extended class is a subtype of the old, and that it still has all the old functionality, OCaml functors are more fine-grained in what they can accomplish. We can choose whether they include the old functionality. And no subtyping relationships are necessarily involved. Moreover, the functor we wrote can be used to extend **any** set implementation with `add_all`, whereas class extension applies to just a **single** base class. There are ways of achieving something similar in Java with *mixins*, which weren't supported before Java 1.5. ## Standard library Map The standard library's Map module implements a dictionary data structure using balanced binary trees. You can see the [implementation of that module on GitHub][mapimplsrc] as well as its [interface][mapintsrc]. [mapintsrc]: https://github.com/ocaml/ocaml/blob/trunk/stdlib/map.mli [mapimplsrc]: https://github.com/ocaml/ocaml/blob/trunk/stdlib/map.ml The Map module defines a functor `Make` that creates a structure implementing a map over a particular type of keys. That type is the input structure to `Make`. The type of that input structure is `Map.OrderedType`, which are types that support a `compare` operation: ``` module type OrderedType = sig type t val compare : t -> t -> int end ``` The Map module needs ordering because balanced binary trees need to be able to compare keys to determine whether one is greater than another. According to the library's documentation, `compare` must satisfy this specification: ``` (* This is a two-argument function [f] such that * [f e1 e2] is zero if the keys [e1] and [e2] are equal, * [f e1 e2] is strictly negative if [e1] is smaller than [e2], * and [f e1 e2] is strictly positive if [e1] is greater than [e2]. * Example: a suitable ordering function is the generic structural * comparison function [Pervasives.compare]. *) val compare : t -> t -> int ``` Arguably this specification is a missed opportunity for good design: the library designers could instead have defined a variant: ``` type order = LT | EQ | GT ``` and required the output type of `compare` to be `order`. But historically many languages have used comparison functions with similar specifications, such as the C standard library's [`strcmp` function][strcmp]. [strcmp]: http://www.gnu.org/software/libc/manual/html_node/String_002fArray-Comparison.html The output of `Map.Make` is a structure whose type is (almost) `Map.S` and supports all the usual operations we would expect from a dictionary: ``` module type S = sig type key type 'a t val empty: 'a t val mem: key -> 'a t -> bool val add: key -> 'a -> 'a t -> 'a t val find: key -> 'a t -> 'a ... end ``` There are two reasons why we say that the output is "almost" that type: 1. The Map module actually specifies a *sharing constraint* (which we covered in the previous notes): `type key = Ord.t`. That is, the output of `Map.Make` shares its `key` type with the type `Ord.t`. That enables keys to be compared with `Ord.compare`. The way that sharing constraint is specified is in the type of `Make` (which can be found in `map.mli`, the interface file for the map compilation unit): ``` module Make : functor (Ord : OrderedType) -> (S with type key = Ord.t) ``` 2. The Map module actually specifies something called a *variance* on the representation type, writing `+'a t` instead of `'a t` as we did above. We won't concern ourselves with what this means; it's [related to subtyping and polymorphic variants][variance]. [variance]: https://blogs.janestreet.com/a-and-a/ The functor `Map.Make` itself (which can be found in `map.ml`, the implementation file for the map compilation unit) is currently defined as follows, though of course the library is free to change its internals in the future: ``` module Make(Ord: OrderedType) = struct type key = Ord.t type 'a t = | Empty | Node of 'a t * key * 'a * 'a t * int (* left subtree * key * value * right subtree * height of node *) let empty = Empty let rec mem x = function | Empty -> false | Node(l, v, _, r, _) -> let c = Ord.compare x v in c = 0 || mem x (if c < 0 then l else r) ... ``` The `key` type is defined to be a synonym for the type `t` inside `Ord`, so `key` values are comparable using `Ord.compare`. The `mem` function uses that to compare keys and decide whether to recurse on the left subtree or right subtree. ## Using the Map module **A map for integer keys.** To create a map, we have to pass a structure into `Map.Make`, and that structure has to define a type `t` and `compare` function. The simplest way to do that is to pass an anonymous structure into the functor: ``` # module IntMap = Map.Make(struct type t = int let compare = Pervasives.compare end);; module IntMap : sig type key = int type 'a t val empty : 'a t ... end # open IntMap;; # let m1 = add 1 "one" empty;; val m1 : string t = <abstr> # find 1 m1;; - : string = "one" # mem 42 m1;; - : bool = false # find 42 m1;; Exception: Not_found. # bindings m1;; - : (int * string) list = [(1, "one")] # let m2 = add 1 1. empty;; val m2 : float t = <abstr> # bindings m2;; - : (int * float) list = [(1, 1.)] ``` Here are some things to note about the utop transcript above: * We can write a structure on one line, even though until now we've always used line breaks to keep them readable. When writing a structure on on line (which we'll only do for really short structures) it can be useful to use the double semicolon between definitions to enhance readability: ``` # module IntMap = Map.Make(struct type t = int;; let compare = Pervasives.compare end);; ``` This is an exception to the general style rule of avoiding double semicolon inside source code. If we didn't want to pass an anonymous structure, we could instead define a module and pass it: ``` module Int = struct type t = int let compare = Pervasives.compare end module IntMap = Map.Make(Int) ``` * The signature of the structure returned by `Map.Make` records the fact that keys are of type `int`. The type `'a t` is the name of the representation type of an `IntMap`. The `'a` type variable in it is the type of values in the map. Although in general the map could have any value type, once we add a single value to a map, that "pins down" the value type of that particular map. When we add the binding from key `1` to string `"one` above, notice that the map value returned is of type `string t`. * The `bindings` function of a map returns an association list of all the bindings in the map. Association lists are, of course, another data structure that implements a dictionary. But they are less efficient than the balanced binary search tree implementation used by `Map`. * The `mem` function tests whether a key is a member of a map. The `find` function finds the value associated with a key, and raises the `Not_found` exception if the key is not bound in the map. That's the same exception that `List.assoc` raises if a key is not bound in an association list. **A map for string keys.** If a module already provides a type `t` that can be compared, we can immediately use that module as an argument to `Map.Make`. Several standard library modules are designed to be used in that way. For example, the `String` module defines a type `t` and a `compare` function that meet the specification of `Map.OrderedType`. So we can easily create maps whose key type is `string`: ``` # module StringMap = Map.Make(String);; module StringMap : sig type key = string ... end ``` Now we could use the string map like we used the int map. This time, for sake of example, let's not open the `StringMap` module: ``` # let m = StringMap.(add "one" 1 empty);; # let m' = StringMap.(add "two" 2 m);; # StringMap.bindings m';; - : (string * int) list = [("one", 1); ("two", 2)] # StringMap.bindings m;; - : (string * int) list = [("one", 1)] # ``` Note that maps are a functional data structure: adding a mapping to `m` did not mutate `m`; rather, it produced a new map that we bound to `m'`, and both the new map and old map remain available for use. **A map for record keys.** When the type of a key becomes more complicated than a built-in primitive type, we might want to write our own custom comparison function. For example, suppose we want a map in which keys are records representing names, and in which names are sorted alphabetically by last name then by first name. In the code below, we provide a module `Name` that can compare records that way: ``` type name = {first:string; last:string} module Name = struct type t = name let compare {first=first1;last=last1} {first=first2;last=last2} = match Pervasives.compare last1 last2 with | 0 -> Pervasives.compare first1 first2 | c -> c end ``` The `Name` module can be used as input to `Map.Make` because it matches the `Map.OrderedType` signature: ``` module NameMap = Map.Make(Name) ``` And now we could add some names to a map. Below, for sake of example, we map some names to birth years, and we use the pipeline operator to easily add multiple bindings one after another: ``` let k1 = {last="Kardashian"; first="Kourtney"} let k2 = {last="Kardashian"; first="Kimberly"} let k3 = {last="Kardashian"; first="Khloe"} let k4 = {last="West"; first="Kanye"} let nm = NameMap.( empty |> add k1 1979 |> add k2 1980 |> add k3 1984 |> add k4 1977) let lst = NameMap.bindings nm ``` The value of `lst` will be ``` [({first = "Khloe"; last = "Kardashian"}, 1984); ({first = "Kimberly"; last = "Kardashian"}, 1980); ({first = "Kourtney"; last = "Kardashian"}, 1979); ({first = "Kanye"; last = "West"}, 1977)] ``` Note how the order of keys in that list is not the same as the order in which we added them. The list is sorted according to the `Name.compare` function we wrote. Several of the other functions in the `Map.S` signature will also process map bindings in that sorted order—for example, `map`, `fold`, and `iter`. ## Code reuse with Map Stepping back from the mechanics of how to use `Map`, let's think about how it achieves code reuse. The implementor of `Map` had a tricky problem to solve: balanced binary search trees require a way to compare keys, but the implementor can't know in advance all the different types of keys that a client of the data structure will want to use. And each type of key might need its own comparison function. Although the standard library's `Pervasives.compare` *can* be used to compare any two values of the same type, the result it returns isn't necessarily what a client will want. For example, it's not guaranteed to sort names in the way we wanted above. So the implementor of `Map` parameterized it on a structure that bundles together the type of keys with a function that can be used to compare them. It's the client's responsibility to implement that structure. Given it, all the code in `Map` can be re-used by the client. The Java Collections Framework solves a similar problem in the TreeMap class, which has a [constructor that takes a Comparator][treemapcomparator]. There, the client has the responsibility of implementing a class for comparisons, rather than a structure. Though the language features are different, the idea is the same. [treemapcomparator]: https://docs.oracle.com/javase/8/docs/api/java/util/TreeMap.html#TreeMap-java.util.Comparator- ## Summary Functors are an advanced language feature in OCaml that might seem mysterious at first. If so, keep in mind: they're really just a kind of function that takes a structure as input and returns a structure as output. The reason they don't behave quite like normal OCaml functions is that structures are not first-class values in OCaml: you can't write regular functions that take a structure as input or return a structure as output. But functors can do just that. Functors and includes enable code reuse. The kinds of code reuse you learned to achieve in CS 2110 with object-oriented features can also be achieved with functors and include. That's not to say that functors and includes are exactly equivalent to those object-oriented features: some kinds of code reuse might be easier to achieve with one set of features than the other. One way to think about this might be that class extension is a very limited (but very useful) combination of functors and includes: extending a class is like writing a functor that takes the base class as input, includes it, then adds new functions. But functors provide more general capability than class extension, because they can compute arbitrary functions of their input structure, rather than being limited to just certain kinds of extension. Perhaps the most important idea to get out of studying the OCaml module system is an appreciation for the aspects of modularity that transcend any given language: namespaces, abstraction, and code reuse. Having seen those ideas in a couple very different languages, you're equipped to recognize them more clearly in the next language you learn. ## Terms and concepts * code reuse * functor * include * maintainability * maps * modularity * open * parameterized structure * set representations * signatures * structures ## Further reading * *Introduction to Objective Caml*, chapter 13 * *Real World OCaml*, chapter 9