Setting up utop and cs3110 for async ------------------------------------ When working with async, it is useful to configure both utop and the cs3110 tool to automatically load async. You can cause utop to automatically load async by creating a file named .ocamlinit in your current directory, and including the lines  #require "async";; open Async.Std  utop automatically executes .ocamlinit every time it starts; this will automatically load the async library and open the Async.Std module. You can cause cs3110 compile to automatically include async by creating a file called .cs3110 in your current directory, and including the lines  compile.opam_packages=async compile.thread=true  This lets you use cs3110 to compile async programs without passing the -t and -p async flags every time. Note that files whose names start with . are hidden in unix; to include them while listing files at the command line, you can type ls -A instead of ls. Async documentation ------------------- When working with Async, there are several important references that you should familiarize yourself with. - **Official async documentation** The official Async API documentation can be found [here](https://blogs.janestreet.com/ocaml-core/111.28.00/doc/). This is the authoritative documentation, and covers the full Async API. - **CS 3110 async documentation** Async is a large, complex API. To help you focus on the parts of Async that are relevant to the course and the projects, we have written CS3110 specific documentation that covers a subset of the API. We have omitted many modules, functions, and optional parameters from the documentation. Last semester's version of the documentation is available [here](http://www.cs.cornell.edu/Courses/cs3110/2015sp/lectures/18/async/Async.html). We will release a new version of this documentation with A5, and will update these notes with a link when we do. - **Utop** As discussed in a [previous recitation](../03-var/rec.html), you can print the contents of a module M in utop by typing module type T = module type of M;;. This can be a valuable method for quickly finding the function you are looking for, if you can guess the module it is in. - **Real world OCaml** [Chapter 18](https://realworldocaml.org/v1/en/html/concurrent-programming-with-async.html) of Real World OCaml covers the basics of Async. It would be a good chapter to read as you're familiarizing yourself with the library. A quick note: it is standard practice to open Async.Std whenever using async. All of these references assume that you have done so. For example, when the documentation discusses Deferred.t, it is really referring to Async.Std.Deferred.t. Make sure you open Std! **Exercise**: Find and read the documentation for Writer.write in the official async documentation and in the 3110 async documentation. Compare the two. Programming with >>= -------------------- As you may have gathered, programming with bind and upon can lead to code that is difficult to read. Conceptually, a program might want to first read a string from a file, then convert it to an integer n, then wait for n seconds, then read a message from the network, and then print "done". In an imperative language, this could would look something like  program () { s = read_file (); n = parse_int (s); wait (n); read_from_network (); print ("done"); }  In OCaml, without deferreds, this could might look like  let program () = let s = read_file () in let n = int_of_string s in let _ = wait n in let p = read_from_network () in print "done"  This simple structure becomes obscured when using bind, because each step requires a new function, and that function has to then call bind to schedule the next step. You might end up writing code like:  let program () = let do_last_step p = print "done"; return () in let do_third_step () = bind (read_from_network ()) do_last_step in let do_second_step s = bind (wait (int_of_string s)) do_third_step in bind (read_file ()) do_second_step  This awkward style of writing code is often called "inversion of control", and different asynchronous programming environments take different approaches to avoid it. In OCaml, we can simplify the code using bind by using anonymous functions:  bind (read_file ()) (fun s -> let n = int_of_string s in bind (wait n) (fun _ -> bind (read_from_network ()) (fun p -> print "done"; return () ) ) )  This allows the code to be written "in the right order", but it still lacks the clarity of the non-asynchronous OCaml version. The infix bind operator (>>=), combined with good indentation solves this problem. > **The secret to writing and reading async programs**: > Think of a function of type 'a -> 'b Deferred.t as being just like a > function of type 'a -> 'b, except that you might have to wait to get the > result. > > Think of >  > f x >>= fun x -> >  > as being just like >  > let x = f x in >  > except that >>= waits for the result of f x to become available. > > Both expressions first execute e, and when e's value becomes available, > that value is bound to x and then evaluation continues from the next line. > The only difference is that the (>>=) version allows other parts of the > program to run in between the execution of e and the time when e's value > becomes available. > > Finally, where a synchronous function contains the final value to return, > the asynchronous function should actually call return to wrap the value to > be returned in a Deferred.t. Let's apply this rule to the above example:  (* synchronous function *) (* asynchronous function *) (* let program () : unit = *) let program () : unit Deferred.t = (* let s = read_file () in *) read_file () >>= fun s -> (* let n = int_of_string s in *) let n = int_of_string s in (* let _ = wait n in *) wait n >>= fun _ -> (* let p = read_from_network () in *) read_from_network () >>= fun p -> (* print "done"; *) print "done"; (* () *) return ()  The way OCaml parses the asynchronous expression is  let program () : unit Deferred.t = read_file () >>= (fun s -> let n = int_of_string s in wait n >>= (fun _ -> read_from_network () >>= (fun p -> print "done"; return () ) ) )  which is the same as our bind version above. However, by omitting the parentheses and indentation, we can think of the code as a sequence of let expressions, and we can forget that there's a complex scheduling process going on as this code executes. **Exercise**: The file [sequence.ml](rec_code/sequence.ml) contains a comment with a hypothetical synchronous function that prompts the user to enter some input, then reads a line of input, then waits 3 seconds, then prints "done", and finally exits the program. Convert the hypothetical synchronous version of the code to a real asynchronous version. Note that the functions called in the hypothetical code are the correct async funtions. That is, you should use printf to print, Reader.read_line stdin to get input, after to wait, and exit to exit. Just as you can use recursive functions to repeatedly process input in a synchronous program, you can write recursive functions to repeatedly process input in an asynchronous program. **Exercise**: The file [loop.ml](rec_code/loop.ml) contains a hypothetical recursive function that repeatedly prompts for input, and then reads the input, waits for three seconds, and then prints the input. If the end of the file is reached, then the program instead prints "done" and exits. Complete the asynchronous implementation of this pseudocode. Note: while typing at the console, you can send an "end of file" by pressing control+d. **Exercise**: Another way to interpret the idea contained in loop.ml is to schedule the output to be printed after three seconds, but to immediately prompt for the next input. Complete this implementation in the function loop_prompt_immediately. Compile and test your code. See what happens if you type many lines in rapid succession. **Exercise**: The file [input.txt](rec_code/input.txt) contains several lines; each line is either blank or is a filename. In the file createFiles.ml, write a program that creates a new blank file for each filename in input.txt. Specifically, your program should - include a helper function create_file : string -> unit Deferred.t that uses Writer.open_file and Writer.close to create a new empty file with the given filename. - include a recursive helper function create_all_files : Reader.t -> unit Deferred.t that repeatedly reads a line from the file (using Reader.read_line), checks to see if the line is blank, and if not, calls create_file to create the file. - use Reader.open_file to open the file and then call create_all_files to create the files. After create_all_files completes, your program should call exit 0 to cause the program to terminate. Compile and run your program to ensure that it works properly. Note: create_file will raise an exception if the files already exist, so you should delete them if you run createFiles multiple times. As you've learned, many recursive functions can be replaced by good uses of higher order functions like map, fold, and filter. The Deferred.List module contains many versions of these functions adapted to work with functions that return deferred values. For example, without async, I might write a function that takes a list of line numbers and returns the corresponding lines as follows:  let read_lines (f : file) (line_numbers : int list) : string list = List.map get_line_of_file line_numbers  The analogous asynchronous program would be:  val get_line_of_file : file -> int -> string Deferred.t let read_lines (f : file) (line_numbers : int list) : string list Deferred.t = Deferred.List.map line_numbers get_line_of_file  Unfortunately, the order of the arguments to Deferred.List.map is the opposite to the order for List.map. But other than this small discrepancy, the asynchronous version of the code is extremely similar to the synchronous version. **Exercise**: Create a second version of the the create_files program that uses Reader.file_lines and Deferred.List.map instead of a recursive helper function. Ivar introduction ----------------- So far, the deferreds we've seen are all automatically determined when a given event happens (e.g. time passes, or the bytes from a file become available, or the deferred returned by a bound function becomes determined). Often, you will want to create a deferred that you decide when to determine. An Ivar.t contains an deferred value, which you can determine by calling Ivar.fill. See the [3110 Ivar documentaion](http://www.cs.cornell.edu/Courses/cs3110/2015sp/lectures/18/async/Async.Std.Ivar.html) for more details. **Exercise**: Use Ivar to implement a function  either : 'a Deferred.t -> 'b Deferred.t -> [Left of 'a | Right of 'b] Deferred.t  The deferred returned from either should become determined when either of the input deferreds become determined. The value of the result should contain the results of either the first or the second input deferred. Hint: first create a new Ivar.t and then use upon to schedule a function on each of the two input deferreds.