# Modules and Printing We'll start today by learning about how printing works in OCaml. We deliberately put that off as long as possible, because printing involves side effects and we wanted you to code for as long as possible in a purely functional style. Then we'll use the OCaml module system to implement some queues and dictionaries. Finally, we'll see how compilation of modules interacts with utop and learn some new utop directives related to modules. ## Printing The `Pervasives` module has built-in printing functions for several of the built-in primitive types: `print_char`, `print_int`, `print_string`, and `print_float`. There's also a `print_endline` function, which is like `print_string` but also outputs a newline. ##### Exercise: hello world [&#10029;] Type the following in utop: ``` # print_endline "Hello world!";; # print_string "Hello world!";; ``` Notice the difference in output from each. &square; Let's look at the type of those functions: ``` # print_endline;; - : string -> unit = <fun> # print_string;; - : string -> unit = <fun> ``` They both take a string as input and return a value of type `unit`, which we haven't seen before. There is only one value of this type, which is written `()` and is also pronounced "unit". So `unit` is like `bool`, except there is one fewer value of type `unit` than there is of `bool`. Unit is therefore used when you need to take an argument or return a value, but there's no interesting value to pass or return. Unit is often used when you're writing or using code that has side effects. Printing is an example of a side effect: it changes the world and can't be undone. If you want to print one thing after another, you could sequence some print functions using nested let expressions: ``` let () = print_endline "THIS" in let () = print_endline "IS" in print_endline "3110" ``` The `()` in `let () =` is a pattern, the unit pattern, which matches only the unit value. The entire expression evaluates to whatever `print_endline 3110` returns, which is the unit value `()`, hence the entire expression has type `unit`. But the evaluation has the side effect of printing `THIS IS 3110` over multiple lines of output. The boilerplate of all the `let () = ... in` above is annoying to have to write! We don't really care about matching the unit value; after all, there's only one thing it could be. So there's a special syntax that can be used to chain together multiple functions who return unit. The expression `e1; e2` first evaluates `e1`, which should evaluate to `()`, then discards that value, and evaluates `e2`. So we could rewrite the above code as: ``` print_endline "THIS"; print_endline "IS"; print_endline "3110" ``` And that is far more idiomatic code. If `e1` does not have type `unit`, then `e1;e2` will give a warning, because you are discarding useful values. If that is truly your intent, you can call the built-in function `ignore : 'a -> unit` to convert any value to `()`: ``` # 3; 5;; Warning 10: this expression should have type unit. - : int = 5 # ignore 3; 5;; - : int = 5 ``` ##### Exercise: print int list rec [&#10029;&#10029;] Write a function `print_int_list : int list -> unit` that prints its input list, one number per line. For example, `print_int_list [1;2;3]` should result in this output: ``` 1 2 3 ``` Here is some code to get you started: ``` let rec print_int_list = function | [] -> () | h::t -> (* fill in here *); print_int_list t ``` &square; ##### Exercise: print int list iter [&#10029;&#10029;] Write a function `print_int_list' : int list -> unit` whose specification is the same as `print_int_list`. Do not use the keyword `rec` in your solution, but instead to use the [List module][list] function `List.iter`. Here is some code to get you started: ``` let print_int_list' lst = List.iter (fun x -> (* fill in here *)) lst ``` [list]: http://caml.inria.fr/pub/docs/manual-ocaml/libref/List.html &square; ## Queues Create a file named `myqueue.ml`. In it, put the following code: ``` module ListQueue = struct type 'a queue = 'a list let empty = [] let is_empty q = (q = []) let enqueue x q = q @ [x] let peek = function | [] -> failwith "Empty" | x::_ -> x let dequeue = function | [] -> failwith "Empty" | _::q -> q end ``` Launch utop and `#use "myqueue.ml"`. You will then be able to create and use `ListQueue`, for example: ``` # let q = ListQueue.enqueue 1 (ListQueue.enqueue 2 ListQueue.empty);; # ListQueue.peek q;; (* returns 2, because 2 was enqueued first *) ``` ##### Exercise: big list queue [&#10029;&#10029;] Use the following code to create `ListQueue`'s of exponentially increasing length: 10, 100, 1000, etc. How big of a queue can you create before there is a noticeable delay? How big until there's a delay of at least 10 seconds? (Note: you can abort utop computations with Ctrl-C.) ``` (* Creates a ListQueue filled with [n] elements. *) let fill_listqueue n = let rec loop n q = if n=0 then q else loop (n-1) (ListQueue.enqueue n q) in loop n ListQueue.empty ``` &square; Add the following code to `myqueue.ml`: ``` module TwoListQueue = struct type 'a queue = {front:'a list; back:'a list} let empty = {front=[]; back=[]} let is_empty = function | {front=[]; back=[]} -> true | _ -> false let norm = function | {front=[]; back} -> {front=List.rev back; back=[]} | q -> q let enqueue x q = norm {q with back=x::q.back} let peek = function | {front=[]; _} -> None | {front=x::_; _} -> Some x let dequeue = function | {front=[]; _} -> None | {front=_::xs; back} -> Some (norm {front=xs; back}) end ``` ##### Exercise: big two-list queue [&#10029;&#10029;] Use the following function to again create queues of exponentially increasing length: ``` let fill_twolistqueue n = let rec loop n q = if n=0 then q else loop (n-1) (TwoListQueue.enqueue n q) in loop n TwoListQueue.empty ``` Now how big of a queue can you create before there's a delay of at least 10 seconds? &square; If we compare `ListQueue` and `TwoListQueue`, it's hopefully clear that `ListQueue` is a simple and correct implementation of a queue data structure. It's probably far less clear that `TwoListQueue` is a correct implementation! For more details about these two implementations, look at the [code associated with this lecture and lab][mod-code]. One of the additional exercises at the end of this lab asks you to explore the efficiencies of these two implementation a bit further. [mod-code]: code.ml ## Dictionaries A *dictionary* maps *keys* to *values*. This data structure typically supports at least a lookup operation that allows you to find the value with a corresponding key, an insert operation that lets you create a new dictionary with an extra key included. And there needs to be a way of creating an empty dictionary. We might represent this in a `Dictionary` module type as follows: ``` module type Dictionary = sig type ('k, 'v) t (* The empty dictionary *) val empty : ('k, 'v) t (* [insert k v d] produces a new dictionary [d'] with the same mappings * as [d] and also a mapping from [k] to [v]. If [k] was already mapped * in [d], that mapping is replaced in [d'] with the new mpaping. *) val insert : 'k -> 'v -> ('k,'v) t -> ('k,'v) t (* [lookup k d] returns the value associated with [k] in [d]. * raises: [Not_found] if [k] is not mapped to any value in [d]. val lookup : 'k -> ('k,'v) t -> 'v end ``` Note how the type `Dictionary.t` is parameterized on two types, `'k` and `'v`, which are written in parentheses and separated by commas. Although `('k,'v)` might look like a pair of values, it is not: it is a syntax for writing multiple type variables. We have seen already in this class that an association list can be used as a simple implementation of a dictionary. For example, here is an association list that maps some well-known names to an approximation of their numeric value: ``` [("pi", 3.14); ("e", 2.718); ("phi", 1.618)] ``` Let's try implementing the `Dictionary` module type with a module called `AssocListDict`. ##### Exercise: AssocListDict [&#10029;&#10029;] Put the code for `Dictionary` above into a file named `dict.ml`. Put this code after it: ``` module AssocListDict = struct type ('k, 'v) t = ('k * 'v) list let empty = failwith "Unimplemented" let insert k v d = failwith "Unimplemented" let lookup k d = failwith "Unimplemented" end ``` Finish the implementation of each of those three definitions. For `lookup`, use the library function `List.assoc`. &square; ##### Exercise: AssocListDict encapsulated [&#10029;&#10029;] Launch utop and type: ``` # #use "dict.ml";; # open AssocListDict;; # let d = insert 1 "one" empty;; ``` The response you get should be: ``` val d : (int * string) list = [(1, "one")] ``` Now close utop. Change the first line of your implementation of `AssocListDict` in `dict.ml` to the following: ``` module AssocListDict : Dictionary = struct ``` Restart utop and repeat the three phrases above (use,open,let). What is the response you get now? Explain in your own words why the response changed. Your answer should incorporate "abstract type" and "encapsulation". &square; ## Modules and utop *There are several pragmatics with modules and the toplevel that you need to understand to use them together effectively. The following sequence of exercises, which should be done in order, is designed to help you navigate some of the common pitfalls and confusions that come up.* Compiling an OCaml file produces a module having the same name as the file (but with the first letter capitalized). These compiled modules can be loaded into the toplevel using `#load`. ##### Exercise: load [&#10029;] Create a file called `mods.ml`. Inside that file, put the following code: ``` let b = "bigred" let inc x = x+1 module M = struct let y = 42 end ``` At the command line, type `ocamlbuild mods.byte` to compile it. Type `ls _build` to see the files that `ocamlbuild` produced. One of them is `mods.cmo`: this is a <u>c</u>ompiled <u>m</u>odule <u>o</u>bject file, aka bytecode. Now run `utop`, and give it this directive (recall that the `#` character is required in front of a directive, it is not part of the prompt): ``` # #directory "_build";; ``` That tells utop to add the `_build` directory to the path in which it looks for compiled (and source) files. Now give utop this directive: ``` # #load "mods.cmo";; ``` There is now a module named `Mods` available to be used. Try these expressions in utop: ``` # Mods.b;; # Mods.M.y;; ``` Now try these: ``` # inc;; (* will fail *) # open Mods;; # inc;; ``` Finish by exiting utop. &square; If you are doing a lot of testing of a particular module, it can be annoying to have to type those directives (`#directory` and `#load`) every time you start utop. The solution is to create a file in the working directory and call that file `.ocamlinit`. Note that the `.` at the front of that filename is required and makes it a [hidden file][hidden] that won't appear in directory listings unless explicitly requested (e.g., with `ls -a`). Everything in `.ocamlinit` will be processed by utop when it loads. ##### Exercise: ocamlinit [&#10029;] Using your text editor, create a file named `.ocamlinit` in the same directory as `mods.ml`. In that file, put the following: ``` #directory "_build";; #load "mods.cmo";; open Mods ``` Now restart utop. All the names defined in `Mods` will already be in scope. For example, these will both succeed: ``` # inc;; # M.y;; ``` [hidden]: https://en.wikipedia.org/wiki/Hidden_file_and_hidden_directory &square; If you are building code that depends on third-party libraries, you can load those libraries with another directive: ##### Exercise: require [&#10029;] Add the following lines to the end of `mods.ml`: ``` open OUnit2 let test = "testb" >:: (fun _ -> assert_equal "bigred" b) ``` Try to recompile the module with `ocamlbuild mods.byte`. That will fail, because you need to tell the build system to include the third-party library OUnit. So try to recompile with `ocamlbuild -pkg oUnit mods.byte`. That will succeed. Now try to restart utop. If you look closely, there will be an error message: ``` File ".ocamlinit", line 1: Error: Reference to undefined global `OUnit2' ``` The problem is that the OUnit library hasn't been loaded into utop yet. Type the following directive: ``` #require "oUnit";; ``` Now you can successfully load your own module without getting an error. ``` #load "mods.cmo";; ``` Quit utop. Add the `#require` directive to `.ocamlinit` anywhere before the `#load` directive. Restart utop. Verify that `inc` is in scope. &square; When compiling a file, the build system automatically figures out which other files it depends on, and recompiles those as necessary. The toplevel, however, is not as sophisticated: you have to make sure to load all the dependencies of a file, as we'll see in the next exercise. ##### Exercise: loads [&#10029;] Create a file `mods2.ml`. In it, put this code: ``` open Mods let x = inc 0 ``` Run `ocamlbuild -pkg oUnit mods2.byte`. Notice that you don't have to name `mods.byte`, even though `mods2.ml` depends on the module `Mod`. The build system is smart that way. Go back to `.ocamlinit` and change it be to just the following: ``` #directory "_build";; #require "oUnit";; ``` Restart utop. Try this directive: ``` # #load "mods2.cmo";; Error: Reference to undefined global `Mods' ``` The toplevel did not automatically load the modules that `Mods2` depends upon. So you have to do it manually: ``` # #load "mods.cmo";; # #load "mods2.cmo";; ``` (You could put both of those in your `.ocamlinit` file if you were going to continue testing them in the toplevel, but let's not do that right now.) &square; There is a big difference between `#load`-ing a compiled module file and `#use`-ing an uncompiled source file. The former loads bytecode and makes it available for use. For example, loading `mods.cmo` caused the `Mod` module to be available, and we could access its members with expressions like `Mod.b`. The latter (`#use`) is *textual inclusion*: it's like typing the contents of the file directly into the toplevel. So using `mods.ml` does **not** cause a `Mod` module to be available, and the definitions in the file can be accessed directly, e.g., `b`. Let's try that out... ##### Exercise: load vs. use [&#10029;] Start a new toplevel, first making sure your `.ocamlinit` does not contain any `#load` directives. We already know from the previous exercise that we could load first `mods.cmo` then `mods2.cmo`. Let's try using the source code: ``` # #use "mods.ml" (* will succeed *) # b;; val b : string = "bigred" # #use "mods2.ml" (* will fail *) Error: Reference to undefined global `Mods' ``` The problem is that `#use`-ing `mods.ml` did not introduce a module into scope. Rather it directly put the definitions found in that source file into scope. &square; So when you're using the toplevel to experiment with your code, it's often better to work with `#load`, because this accurately reflects how your modules interact with each other and with the outside world, rather than `#use` them. ## Additional exercises ##### Exercise: queue efficiency [&#10029;&#10029;&#10029;] Compare the implementations of `enqueue` in `ListQueue` vs. `TwoListQueue`. Explain in your own words why the efficiency of `ListQueue.enqueue` is linear time in the length of the queue. *Hint: consider the `@` operator.* Then explain why adding \\(n\\) elements to the queue takes time that is quadratic in \\(n\\). Now consider `TwoListQueue.enqueue`. Suppose that the queue is in a state where it has never had any elements dequeued. Explain in your own words why `TwoListQueue.enqueue` is constant time. Then explain why adding \\(n\\) elements to the queue takes time that is linear in \\(n\\). *(Note: the enqueue and dequeue operations for `TwoListQueue` remain constant time even after interleaving them, but showing why that is so will require us to study *amortized analysis* later in the semester.)* &square; ##### Exercise: binary search tree dictionary [&#10029;&#10029;&#10029;] Write a module `BstDict` that implements the `Dictionary` module type using the `tree` type and `lookup` and `insert` functions that you defined in the lab on variants. &square; ##### Exercise: fraction [&#10029;&#10029;&#10029;&#10029;] Write a module that implements the `Fraction` module type below: ``` module type Fraction = sig (* A fraction is a rational number p/q, where q != 0.*) type t (* [make n d] is n/d. Requires d != 0. *) val make : int -> int -> t val numerator : t -> int val denominator : t -> int val toString : t -> string val toReal : t -> float val add : t -> t -> t val mul : t -> t -> t end ``` Your module should ensure these invariants for every value `v` of type `t` that it returns from `make`, `add`, and `mul`: 1. `v` is in *[reduced form][irreducible]* 2. the denominator of `v` is positive For the first invariant, you might find this implementation of Euclid's algorithm to be helpful: ``` (* [gcd x y] is the greatest common divisor of [x] and [y]. * requires: [x] and [y] are positive. *) let rec gcd (x:int) (y:int) : int = if x = 0 then y else if (x < y) then gcd (y - x) x else gcd y (x - y) ``` [irreducible]: https://en.wikipedia.org/wiki/Irreducible_fraction