# Recitation 27: Streams and Lazy Evaluation

## The if Construct

In OCaml, the value of the conditional test `if true then e1 else e2` is the value of `e1`, and `e2` is never evaluated. Similarly, the value of `if false then e1 else e2` is the value of `e2`, and `e1` is never evaluated.

```# List.hd [];;
Exception: Failure "hd".
# if true then 4 else List.hd [];;
- : int = 4
```

This is called lazy evaluation. We say that the conditional test `if x then e1 else e2` eagerly evaluates the test `x` but then lazily evaluates `e1` and `e2`, i.e. only when needed.

Not so with function applications. When a function is applied to arguments in OCaml, the arguments are evaluated eagerly, and the function is applied to the resulting values. So if we tried to define the conditional test in a straightforward way as an OCaml function, it would not work.

```# let ite x e1 e2 = match x with true -> e1 | false -> e2;;
val ite : bool -> 'a -> 'a -> 'a = <fun>
# ite true 4 (List.hd []);;
Exception: Failure "hd".
```

The language Haskell is based on the idea of lazy evaluation. It is very similar to OCaml, except that arguments are evaluated lazily.

It is possible to simulate lazy evaluation by using thunks. A thunk is a function of the form `fun () -> ...`. It takes advantage of the fact that the body of a function is not evaluated when the function is defined, but only when it is applied. Thus function bodies are evaluated lazily.

```# let f = fun () -> List.hd [];;
val f : unit -> 'a = <fun>
# f ();;
Exception: Failure "hd".
```

Now we can redefine `ite` using thunks as follows:

```# let ite x e1 e2 = match x with true -> e1() | false -> e2();;
val ite : bool -> (unit -> 'a) -> (unit -> 'a) -> 'a = <fun>
# ite true (fun () -> 4) (fun () -> List.hd []);;
- : int = 4
```

In OCaml, a function `fun x -> e` is considered a value, and no attempt is made to evaluate `e` until the function is applied. (Actually, this is not strictly true. Sometimes the compiler will attempt to perform some preevaluation of function bodies for optimization purposes when there is no danger in doing so. For example, `fun x -> 1 + 1` will almost surely be optimized to `fun x -> 2`).

## Call-by-Value and Call-by-Name

Restricting our attention to purely functional code, let's describe eager and lazy evaluation in terms of the substitution model for the `let` construct `let x = e1 in e2` and function application `(fun x -> e2) e1` (which are essentially the same thing).

In eager evaluation, the evaluation rules are:

• `let x = v in e2 => e2{v/x}`
• `(fun x -> e2) v => e2{v/x}`

where `v` is a value. Thus these rules cannot be applied to `let x = e1 in e2` and `(fun x -> e2) e1` until `e1` is evaluated. This is called call by value. Most programming languages, including OCaml, use this discipline.

On the other hand, in lazy evaluation, the rules are:

• `let x = e1 in e2 => e2{e1/x}`
• `(fun x -> e2) e1 => e2{e1/x}`

These rules can be applied to the unevaluated argument `e1`. This is called call by name. Haskell uses this discipline.

The difficulty with call by name is that it can be very inefficient. For example, what if `x` occurs ten times in `e2`? Will we then evaluate `e1` ten times? In Haskell, this problem is addressed by a thunk-like mechanism that memoizes (caches) the value the first time the expression is evaluated, so it is only evaluated once. However, note that this will only work for purely functional code; it does not work if there are side-effects.

## Streams

A stream is a possibly infinite list; for example,

• the stream of all natural numbers `[0; 1; 2; 3; 4; ...]`
• the stream of all Fibonacci numbers `[1; 1; 2; 3; 5; 8; 13; ...]`
• the stream of all primes `[2; 3; 5; 7; 11; 13; ...]`.

It is actually possible to create some infinite OCaml lists, but only regular (or ultimately periodic) ones, and they are not too useful. We mostly use only finite OCaml lists. Recall that if lists had not been built into OCaml, we might have defined them as

```type 'a list = Nil | Cons of 'a * 'a list
```

This is actually how it is done. Finite lists are built inductively from right to left, starting with `Nil` and `Cons`'ing a new head onto an already evaluated tail.

However, we can get infinite streams by deferring the creation of the tail using thunks. Thus we create the tail only when we need it.

```type 'a stream = Nil | Cons of 'a * (unit -> 'a stream)
```

Now we can define some infinite streams.

```(* an infinite stream of 1's *)
let rec (ones : int stream) = Cons (1, fun () -> ones)

(* the natural numbers *)
let rec from (n : int) : int stream =
Cons (n, fun () -> from (n + 1))
let naturals = from 0
```

What have we just created? The head of stream `ones` is 1 and its tail is itself, namely `ones`. Thus, an infinite stream of 1's. But where are all those 1's? The computer is finite. The answer is that they are not created yet. They will only be created when we need them.

Let's define some operations on streams.

```(* head of a stream *)
let hd (s : 'a stream) : 'a =
match s with
Nil -> failwith "hd"
| Cons (x, _) -> x

(* tail of a stream *)
let tl (s : 'a stream) : 'a stream =
match s with
Nil -> failwith "tl"
| Cons (_, g) -> g () (* get the tail by evaluating the thunk *)

(* n-th element of a stream *)
let rec nth (s : 'a stream) (n : int) : 'a =
if n = 0 then hd s else nth (tl s) (n - 1)

(* make a stream from a list *)
let from_list (l : 'a list) : 'a stream =
List.fold_right (fun x s -> Cons (x, fun () -> s)) l Nil

(* make a list from the first n elements of a stream *)
let rec take (s : 'a stream) (n : int) : 'a list =
if n <= 0 then [] else
match s with
Nil -> []
| _ -> hd s :: take (tl s) (n - 1)
```

Let's try these out.

```# hd ones;;
- : int = 1
# hd (tl ones);;
- : int = 1
# nth ones 10;;
- : int = 1
# nth ones 10000000;;
- : int = 1
# take ones 20;;
- : int list = [1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1]
# let five = from_list [1; 2; 3; 4; 5];;
val five : int stream = Cons (1, <fun>)
# take five 2;;
- : int list = [1; 2]
# take five 10;;
- : int list = [1; 2; 3; 4; 5]
# take naturals 10;;
- : int list = [0; 1; 2; 3; 4; 5; 6; 7; 8; 9]
```

Now we can operate on streams as if they existed in their entirety. For example, we can define the usual list operations `map` and `filter`:

```let rec map (f : 'a -> 'b) (s : 'a stream) : 'b stream =
match s with Nil -> Nil
| _ -> Cons (f (hd s), fun () -> map f (tl s))

let rec filter (f : 'a -> bool) (s : 'a stream) : 'a stream =
match s with Nil -> Nil
| Cons (x, g) ->
if f x then Cons (x, fun () -> filter f (g ()))
else filter f (g ())

let rec map2 (f: 'a -> 'b -> 'c)
(s : 'a stream) (t : 'b stream) : 'c stream =
match (s, t) with
(Nil, Nil) -> Nil
| (Cons (x, g), Cons (y, h)) ->
Cons (f x y, fun () -> map2 f (g ()) (h ()))
| _ -> failwith "map2"
```

Let's try these out.

```# let square n = n * n;;
val square : int -> int = <fun>
# take (map square naturals) 20;;
- : int list =
[0; 1; 4; 9; 16; 25; 36; 49; 64; 81; 100; 121; 144; 169; 196; 225; 256; 289;
324; 361]
# let even = fun n -> n mod 2 = 0;;
val even : int -> bool = <fun>
# take (filter even naturals) 20;;
- : int list =
[0; 2; 4; 6; 8; 10; 12; 14; 16; 18; 20; 22; 24; 26; 28; 30; 32; 34; 36; 38]
```

Now for something a little fancier:

```(* the Fibonacci sequence *)
let fib1 : int stream =
let rec fibgen (a : int) (b : int) : int stream =
Cons(a, fun () -> fibgen b (a + b))
in fibgen 1 1

# take fib1 20;;
- : int list =
[1; 1; 2; 3; 5; 8; 13; 21; 34; 55; 89; 144; 233; 377; 610; 987; 1597; 2584;
4181; 6765]
# nth fib1 43;;
- : int = 701408733

(* another version - this one is a lot slower *)
let rec fib2 : int stream =
let add = map2 (+) in
Cons (1, fun () -> Cons (1, fun () -> add fib2 (tl fib2)))

# take fib2 20;;
- : int list =
[1; 1; 2; 3; 5; 8; 13; 21; 34; 55; 89; 144; 233; 377; 610; 987; 1597; 2584;
4181; 6765]

(* delete multiples of p from a stream *)
let sift (p : int) : int stream -> int stream =
filter (fun n -> n mod p <> 0)

(* sieve of Eratosthenes *)
let rec sieve (s : int stream) : int stream =
match s with Nil -> Nil
| Cons (p, g) -> Cons (p, fun () -> sieve (sift p (g ())))

(* primes *)
let primes = sieve (from 2)

# take primes 20;;
- : int list =
[2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71]
# nth primes 1000;;
- : int = 7927
```

Streams are actually useful in real life. Some applications:

• compilers reading source file from text
• network sockets
• audio and video signal processing
• voice recognition
• approximating solutions to equations using convergent series

One last example: merging and splitting streams.

```(* merge two streams into one, taking elements alternately *)
let rec merge (s : 'a stream) (t : 'a stream) : 'a stream =
match s with Nil -> t
| Cons (x, g) -> Cons (x, fun () -> merge t (g ()))

(* split a stream into two streams - inverse of merge *)
let rec split (s : 'a stream) : 'a stream * 'a stream =
match take s 2 with
[] -> (Nil, Nil)
| [x] -> (Cons (x, fun () -> Nil), Nil)
| x :: y :: _ ->
let t = tl (tl s) in
(Cons (x, fun () -> fst (split t)), Cons (y, fun () -> snd (split t)))
```