A stream is an infinite list. Sometimes these are also called sequences, delayed lists, or lazy lists. We can try to define a stream by analogy to how we can define (finite) lists. Recall that definition:
type 'a mylist = | Nil | Cons of 'a * 'a mylist
We could try to convert that into a definition for streams:
(* doesn't actually work *) type 'a stream = | Cons of 'a * 'a stream
Note that we got rid of the
Nil constructor, because the empty list is finite,
but we want only infinite lists.
The problem with that definition is that it's really no better than the built-in
list in OCaml, in that we still can't define
# let rec from n = Cons (n, from (n+1));; # let nats = from 0;; Stack overflow during evaluation (looping recursion?).
As before, that definition attempts to go off and compute the entire infinite sequence of naturals.
What we need is a way to pause evaluation, so that at any point in time, only a finite approximation to the infinite sequence has been computed. Fortunately, we already know how to do that!
Consider the following definitions:
# let f1 = failwith "oops";; Exception: Failure "oops". # let f2 = fun x -> failwith "oops";; val f2 : 'a -> 'b = <fun> # f2 ();; Exception: Failure "oops".
The definition of
f1 immediately raises an exception, whereas the definition of
does not. Why? Because
f2 wraps the
failwith inside an anonymous function.
Recall that, according to the dynamic semantics of OCaml, functions are already
values. So no computation is done inside the body of the function until it is applied.
f2 () raises an exception.
We can use this property of evaluation—that functions delay evaluation—to our advantage in defining streams: let's wrap the tail of a stream inside a function. Since it doesn't really matter what argument that function takes, we might as well let it be unit. A function that is used just to delay computation, and in particular one that takes unit as input, is called a thunk.
(* An ['a stream] is an infinite list of values of type ['a]. * AF: [Cons (x, f)] is the stream whose head is [x] and tail is [f()]. * RI: none. *) type 'a stream = Cons of 'a * (unit -> 'a stream)
This definition turns out to work quite well. We can define
nats, at last:
# let rec from n = Cons (n, fun () -> from (n+1));; val from : int -> int stream = <fun> # let nats = from 0;; val nats : int stream = Cons (0, <fun>)
We do not get an infinite loop or a stack overflow. The evaluation of
paused. Only the first element of it,
0, has been computed. The remaining elements
will not be computed until they are requested. To do that, we can define functions
to access parts of a stream, similarly to how we can access parts of a list:
(* [hd s] is the head of [s] *) let hd (Cons (h, _)) = h (* [tl s] is the tail of [s] *) let tl (Cons (_, tf)) = tf () (* [take n s] is the list of the first [n] elements of [s] *) let rec take n s = if n=0 then  else hd s :: take (n-1) (tl s) (* [drop n s] is all but the first [n] elements of [s] *) let rec drop n s = if n = 0 then s else drop (n-1) (tl s)
It is informative to observe the types of those functions:
val hd : 'a stream -> 'a val tl : 'a stream -> 'a stream val take : int -> 'a stream -> 'a list val drop : int -> 'a stream -> 'a stream
Note how, in the definition of
tl, we must apply the function
() to obtain
the tail of the stream. That is, we must force the thunk to evaluate at that point,
rather than continue to delay its computation.
We can use
take to observe a finite prefix of a stream. For example:
# take 10 nats;; - : int list = [0; 1; 2; 3; 4; 5; 6; 7; 8; 9]