Lecture 23: Streams and Lazy Evaluation

The if Construct

In OCaml, the value of the conditional test if true then e1 else e2 is the value of e1, and e2 is never evaluated. Similarly, the value of if false then e1 else e2 is the value of e2, and e1 is never evaluated.

# List.hd [];;
Exception: Failure "hd".
# if true then 4 else List.hd [];;
- : int = 4

This is called lazy evaluation. We say that the conditional test if x then e1 else e2 eagerly evaluates the test x but then lazily evaluates e1 and e2, i.e. only when needed.

Not so with function applications. When a function is applied to arguments in OCaml, the arguments are evaluated eagerly, and the function is applied to the resulting values. So if we tried to define the conditional test in a straightforward way as an OCaml function, it would not work.

# let ite x e1 e2 = match x with true -> e1 | false -> e2;;
val ite : bool -> 'a -> 'a -> 'a = <fun>
# ite true 4 (List.hd []);;
Exception: Failure "hd".

The language Haskell is based on the idea of lazy evaluation. It is very similar to OCaml, except that arguments are evaluated lazily.

It is possible to simulate lazy evaluation by using thunks. A thunk is a function of the form fun () -> .... It takes advantage of the fact that the body of a function is not evaluated when the function is defined, but only when it is applied. Thus function bodies are evaluated lazily.

# let f = fun () -> List.hd [];;
val f : unit -> 'a = <fun>
# f ();;
Exception: Failure "hd".

Now we can redefine ite using thunks as follows:

# let ite x e1 e2 = match x with true -> e1() | false -> e2();;
val ite : bool -> (unit -> 'a) -> (unit -> 'a) -> 'a = <fun>
# ite true (fun () -> 4) (fun () -> List.hd []);;
- : int = 4

In OCaml, a function fun x -> e is considered a value, and no attempt is made to evaluate e until the function is applied. (Actually, this is not strictly true. Sometimes the compiler will attempt to perform some preevaluation of function bodies for optimization purposes when there is no danger in doing so. For example, fun x -> 1 + 1 will almost surely be optimized to fun x -> 2).

Call-by-Value and Call-by-Name

Restricting our attention to purely functional code, let's describe eager and lazy evaluation in terms of the substitution model for the let construct let x = e1 in e2 and function application (fun x -> e2) e1 (which are essentially the same thing).

In eager evaluation, the evaluation rules are:

where v is a value. Thus these rules cannot be applied to let x = e1 in e2 and (fun x -> e2) e1 until e1 is evaluated. This is called call by value. Most programming languages, including OCaml, use this discipline.

On the other hand, in lazy evaluation, the rules are:

These rules can be applied to the unevaluated argument e1. This is called call by name. Haskell uses this discipline.

The difficulty with call by name is that it can be very inefficient. For example, what if x occurs ten times in e2? Will we then evaluate e1 ten times? In Haskell, this problem is addressed by a thunk-like mechanism that memoizes (caches) the value the first time the expression is evaluated, so it is only evaluated once. However, note that this will only work for purely functional code; it does not work if there are side-effects.

Streams

A stream is a possibly infinite list; for example,

It is actually possible to create some infinite OCaml lists, but only regular (or ultimately periodic) ones, and they are not too useful. We mostly use only finite OCaml lists. Recall that if lists had not been built into OCaml, we might have defined them as

type 'a list = Nil | Cons of 'a * 'a list

This is actually how it is done. Finite lists are built inductively from right to left, starting with Nil and Cons'ing a new head onto an already evaluated tail.

However, we can get infinite streams by deferring the creation of the tail using thunks. Thus we create the tail only when we need it.

type 'a stream = Nil | Cons of 'a * (unit -> 'a stream)

Now we can define some infinite streams.

(* an infinite stream of 1's *)
let rec (ones : int stream) = Cons (1, fun () -> ones)

(* the natural numbers *)
let rec from (n : int) : int stream =
  Cons (n, fun () -> from (n + 1))
let naturals = from 0

What have we just created? The head of stream ones is 1 and its tail is itself, namely ones. Thus, an infinite stream of 1's. But where are all those 1's? The computer is finite. The answer is that they are not created yet. They will only be created when we need them.

Let's define some operations on streams.

(* head of a stream *)
let hd (s : 'a stream) : 'a =
  match s with
    Nil -> failwith "hd"
  | Cons (x, _) -> x

(* tail of a stream *)
let tl (s : 'a stream) : 'a stream =
  match s with
    Nil -> failwith "tl"
  | Cons (_, g) -> g () (* get the tail by evaluating the thunk *)

(* n-th element of a stream *)
let rec nth (s : 'a stream) (n : int) : 'a =
  if n = 0 then hd s else nth (tl s) (n - 1)

(* make a stream from a list *)
let from_list (l : 'a list) : 'a stream =
  List.fold_right (fun x s -> Cons (x, fun () -> s)) l Nil

(* make a list from the first n elements of a stream *)
let rec take (s : 'a stream) (n : int) : 'a list =
  if n <= 0 then [] else
  match s with
    Nil -> []
  | _ -> hd s :: take (tl s) (n - 1)

Let's try these out.

# hd ones;;
- : int = 1
# hd (tl ones);;
- : int = 1
# nth ones 10;;
- : int = 1
# nth ones 10000000;;
- : int = 1
# take ones 20;;
- : int list = [1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1]
# let five = from_list [1; 2; 3; 4; 5];;
val five : int stream = Cons (1, <fun>)
# take five 2;;
- : int list = [1; 2]
# take five 10;;
- : int list = [1; 2; 3; 4; 5]
# take naturals 10;;
- : int list = [0; 1; 2; 3; 4; 5; 6; 7; 8; 9]

Now we can operate on streams as if they existed in their entirety. For example, we can define the usual list operations map and filter:

let rec map (f : 'a -> 'b) (s : 'a stream) : 'b stream =
  match s with Nil -> Nil
  | _ -> Cons (f (hd s), fun () -> map f (tl s))

let rec filter (f : 'a -> bool) (s : 'a stream) : 'a stream =
  match s with Nil -> Nil
  | Cons (x, g) ->
      if f x then Cons (x, fun () -> filter f (g ()))
      else filter f (g ())

let rec map2 (f: 'a -> 'b -> 'c)
             (s : 'a stream) (t : 'b stream) : 'c stream =
  match (s, t) with
    (Nil, Nil) -> Nil
  | (Cons (x, g), Cons (y, h)) ->
       Cons (f x y, fun () -> map2 f (g ()) (h ()))
  | _ -> failwith "map2"

Let's try these out.

# let square n = n * n;;
val square : int -> int = <fun>
# take (map square naturals) 20;;
- : int list =
[0; 1; 4; 9; 16; 25; 36; 49; 64; 81; 100; 121; 144; 169; 196; 225; 256; 289;
 324; 361]
# let even = fun n -> n mod 2 = 0;;
val even : int -> bool = <fun>
# take (filter even naturals) 20;;
- : int list =
[0; 2; 4; 6; 8; 10; 12; 14; 16; 18; 20; 22; 24; 26; 28; 30; 32; 34; 36; 38]

Now for something a little fancier:

(* the Fibonacci sequence *)
let fib1 : int stream =
  let rec fibgen (a : int) (b : int) : int stream =
    Cons(a, fun () -> fibgen b (a + b))
  in fibgen 1 1

# take fib1 20;;
- : int list =
[1; 1; 2; 3; 5; 8; 13; 21; 34; 55; 89; 144; 233; 377; 610; 987; 1597; 2584;
 4181; 6765]
# nth fib1 43;;
- : int = 701408733

(* another version - this one is a lot slower *)
let rec fib2 : int stream =
  let add = map2 (+) in
  Cons (1, fun () -> Cons (1, fun () -> add fib2 (tl fib2)))

# take fib2 20;;
- : int list =
[1; 1; 2; 3; 5; 8; 13; 21; 34; 55; 89; 144; 233; 377; 610; 987; 1597; 2584;
 4181; 6765]

(* delete multiples of p from a stream *)
let sift (p : int) : int stream -> int stream =
  filter (fun n -> n mod p <> 0)

(* sieve of Eratosthenes *)
let rec sieve (s : int stream) : int stream =
  match s with Nil -> Nil
  | Cons (p, g) -> Cons (p, fun () -> sieve (sift p (g ())))

(* primes *)
let primes = sieve (from 2)

# take primes 20;;
- : int list =
[2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71]
# nth primes 1000;;
- : int = 7927

Streams are actually useful in real life. Some applications:

One last example: merging and splitting streams.

(* merge two streams into one, taking elements alternately *)
let rec merge (s : 'a stream) (t : 'a stream) : 'a stream =
  match s with Nil -> t
  | Cons (x, g) -> Cons (x, fun () -> merge t (g ()))

(* split a stream into two streams - inverse of merge *)
let rec split (s : 'a stream) : 'a stream * 'a stream =
  match take s 2 with
    [] -> (Nil, Nil)
  | [x] -> (Cons (x, fun () -> Nil), Nil)
  | x :: y :: _ ->
     let t = tl (tl s) in
     (Cons (x, fun () -> fst (split t)), Cons (y, fun () -> snd (split t)))