Recitation: Programming with Async

Setting up utop and cs3110 for async ------------------------------------ When working with async, it is useful to configure both utop and the cs3110 tool to automatically load async. You can cause `utop` to automatically load async by creating a file named `.ocamlinit` in your current directory, and including the lines ``` #require "async";; open Async.Std ``` `utop` automatically executes `.ocamlinit` every time it starts; this will automatically load the async library and open the Async.Std module. You can cause `cs3110 compile` to automatically include async by creating a file called `.cs3110` in your current directory, and including the lines ``` compile.opam_packages=async compile.thread=true ``` This lets you use `cs3110` to compile async programs without passing the `-t` and `-p async` flags every time. Note that files whose names start with `.` are hidden in unix; to include them while listing files at the command line, you can type `ls -A` instead of `ls`. Async documentation ------------------- When working with Async, there are several important references that you should familiarize yourself with. - **Official async documentation** The official Async API documentation can be found [here](https://blogs.janestreet.com/ocaml-core/111.28.00/doc/). This is the authoritative documentation, and covers the full Async API. - **CS 3110 async documentation** Async is a large, complex API. To help you focus on the parts of Async that are relevant to the course and the projects, we have written CS3110 specific documentation that covers a subset of the API. We have omitted many modules, functions, and optional parameters from the documentation. Last semester's version of the documentation is available [here](http://www.cs.cornell.edu/Courses/cs3110/2015sp/lectures/18/async/Async.html). We will release a new version of this documentation with A5, and will update these notes with a link when we do. - **Utop** As discussed in a [previous recitation](../03-var/rec.html), you can print the contents of a module `M` in utop by typing `module type T = module type of M;;`. This can be a valuable method for quickly finding the function you are looking for, if you can guess the module it is in. - **Real world OCaml** [Chapter 18](https://realworldocaml.org/v1/en/html/concurrent-programming-with-async.html) of Real World OCaml covers the basics of Async. It would be a good chapter to read as you're familiarizing yourself with the library. A quick note: it is standard practice to `open Async.Std` whenever using async. All of these references assume that you have done so. For example, when the documentation discusses `Deferred.t`, it is really referring to `Async.Std.Deferred.t`. Make sure you open Std! **Exercise**: Find and read the documentation for `Writer.write` in the official async documentation and in the 3110 async documentation. Compare the two. Programming with >>= -------------------- As you may have gathered, programming with `bind` and `upon` can lead to code that is difficult to read. Conceptually, a program might want to first read a string from a file, then convert it to an integer n, then wait for n seconds, then read a message from the network, and then print "done". In an imperative language, this could would look something like ``` program () { s = read_file (); n = parse_int (s); wait (n); read_from_network (); print ("done"); } ``` In OCaml, without deferreds, this could might look like ``` let program () = let s = read_file () in let n = int_of_string s in let _ = wait n in let p = read_from_network () in print "done" ``` This simple structure becomes obscured when using `bind`, because each step requires a new function, and that function has to then call bind to schedule the next step. You might end up writing code like: ``` let program () = let do_last_step p = print "done"; return () in let do_third_step () = bind (read_from_network ()) do_last_step in let do_second_step s = bind (wait (int_of_string s)) do_third_step in bind (read_file ()) do_second_step ``` This awkward style of writing code is often called "inversion of control", and different asynchronous programming environments take different approaches to avoid it. In OCaml, we can simplify the code using bind by using anonymous functions: ``` bind (read_file ()) (fun s -> let n = int_of_string s in bind (wait n) (fun _ -> bind (read_from_network ()) (fun p -> print "done"; return () ) ) ) ``` This allows the code to be written "in the right order", but it still lacks the clarity of the non-asynchronous OCaml version. The infix bind operator `(>>=)`, combined with good indentation solves this problem. > **The secret to writing and reading async programs**: > Think of a function of type `'a -> 'b Deferred.t` as being just like a > function of type `'a -> 'b`, except that you might have to wait to get the > result. > > Think of > ``` > f x >>= fun x -> > ``` > as being just like > ``` > let x = f x in > ``` > except that `>>=` waits for the result of `f x` to become available. > > Both expressions first execute `e`, and when `e`'s value becomes available, > that value is bound to `x` and then evaluation continues from the next line. > The only difference is that the `(>>=)` version allows other parts of the > program to run in between the execution of `e` and the time when `e`'s value > becomes available. > > Finally, where a synchronous function contains the final value to return, > the asynchronous function should actually call `return` to wrap the value to > be returned in a `Deferred.t`. Let's apply this rule to the above example: ``` (* synchronous function *) (* asynchronous function *) (* let program () : unit = *) let program () : unit Deferred.t = (* let s = read_file () in *) read_file () >>= fun s -> (* let n = int_of_string s in *) let n = int_of_string s in (* let _ = wait n in *) wait n >>= fun _ -> (* let p = read_from_network () in *) read_from_network () >>= fun p -> (* print "done"; *) print "done"; (* () *) return () ``` The way OCaml parses the asynchronous expression is ``` let program () : unit Deferred.t = read_file () >>= (fun s -> let n = int_of_string s in wait n >>= (fun _ -> read_from_network () >>= (fun p -> print "done"; return () ) ) ) ``` which is the same as our `bind` version above. However, by omitting the parentheses and indentation, we can think of the code as a sequence of `let` expressions, and we can forget that there's a complex scheduling process going on as this code executes. **Exercise**: The file [sequence.ml](rec_code/sequence.ml) contains a comment with a hypothetical synchronous function that prompts the user to enter some input, then reads a line of input, then waits 3 seconds, then prints "done", and finally exits the program. Convert the hypothetical synchronous version of the code to a real asynchronous version. Note that the functions called in the hypothetical code are the correct async funtions. That is, you should use `printf` to print, `Reader.read_line stdin` to get input, `after` to wait, and `exit` to exit. Just as you can use recursive functions to repeatedly process input in a synchronous program, you can write recursive functions to repeatedly process input in an asynchronous program. **Exercise**: The file [loop.ml](rec_code/loop.ml) contains a hypothetical recursive function that repeatedly prompts for input, and then reads the input, waits for three seconds, and then prints the input. If the end of the file is reached, then the program instead prints "done" and exits. Complete the asynchronous implementation of this pseudocode. Note: while typing at the console, you can send an "end of file" by pressing control+d. **Exercise**: Another way to interpret the idea contained in `loop.ml` is to schedule the output to be printed after three seconds, but to immediately prompt for the next input. Complete this implementation in the function `loop_prompt_immediately`. Compile and test your code. See what happens if you type many lines in rapid succession. **Exercise**: The file [input.txt](rec_code/input.txt) contains several lines; each line is either blank or is a filename. In the file `createFiles.ml`, write a program that creates a new blank file for each filename in `input.txt`. Specifically, your program should - include a helper function `create_file : string -> unit Deferred.t` that uses `Writer.open_file` and `Writer.close` to create a new empty file with the given filename. - include a recursive helper function `create_all_files : Reader.t -> unit Deferred.t` that repeatedly reads a line from the file (using `Reader.read_line`), checks to see if the line is blank, and if not, calls `create_file` to create the file. - use `Reader.open_file` to open the file and then call `create_all_files` to create the files. After create_all_files completes, your program should call `exit 0` to cause the program to terminate. Compile and run your program to ensure that it works properly. Note: `create_file` will raise an exception if the files already exist, so you should delete them if you run `createFiles` multiple times. As you've learned, many recursive functions can be replaced by good uses of higher order functions like `map`, `fold`, and `filter`. The `Deferred.List` module contains many versions of these functions adapted to work with functions that return deferred values. For example, without async, I might write a function that takes a list of line numbers and returns the corresponding lines as follows: ``` let read_lines (f : file) (line_numbers : int list) : string list = List.map get_line_of_file line_numbers ``` The analogous asynchronous program would be: ``` val get_line_of_file : file -> int -> string Deferred.t let read_lines (f : file) (line_numbers : int list) : string list Deferred.t = Deferred.List.map line_numbers get_line_of_file ``` Unfortunately, the order of the arguments to `Deferred.List.map` is the opposite to the order for `List.map`. But other than this small discrepancy, the asynchronous version of the code is extremely similar to the synchronous version. **Exercise**: Create a second version of the the `create_files` program that uses `Reader.file_lines` and `Deferred.List.map` instead of a recursive helper function. Ivar introduction ----------------- So far, the deferreds we've seen are all automatically determined when a given event happens (e.g. time passes, or the bytes from a file become available, or the deferred returned by a bound function becomes determined). Often, you will want to create a deferred that you decide when to determine. An `Ivar.t` contains an deferred value, which you can determine by calling `Ivar.fill`. See the [3110 Ivar documentaion](http://www.cs.cornell.edu/Courses/cs3110/2015sp/lectures/18/async/Async.Std.Ivar.html) for more details. **Exercise**: Use `Ivar` to implement a function ``` either : 'a Deferred.t -> 'b Deferred.t -> [`Left of 'a | `Right of 'b] Deferred.t ``` The deferred returned from `either` should become determined when either of the input deferreds become determined. The value of the result should contain the results of either the first or the second input deferred. Hint: first create a new `Ivar.t` and then use `upon` to schedule a function on each of the two input deferreds.