Section 25
Streams

The view of programming we will take is rather like signal processing. Think about a stereo system, and the flow of information through it:

Watch the signal / data flow through a processor/program.

CD Player --> Pre-amp --> Amp -->

We're going to do something similar, having information flowing through a collection of boxes.

We'll look at primitives (kinds of boxes)

Idea:

We've looked at some of these (e.g., map, fold) for lists. Streams will be similar. Initially, they'll just *be* lists. Then we'll let them be infinitely long. Eg, we'll have a stream of all the integers that couldn't possibly fit as a list.

To begin with, let's look at some examples just involving lists.

Consider the problem of summing the odd integers between 1 and N.

fun sumOddSquares(n:int) =
  let fun next(k:int) =
      if k > n then 0 else
        (if odd k then sqr(k) + next (k+1) else next(k+1))
  in
    next 1
  end

There are four things going on here:

  1. ENUMERATE the numbers 1 ... n
  2. FILTER out the even numbers from that list
  3. MAP square on each of the selected numbers
  4. FOLD the result, using +, starting from 0.
(Draw the 4 boxes, taking n as the initial input)

This pattern is pretty hard to see from the code, though: Everything's going on at once. We're going to use STREAMS to capture this picture.

Initially, let's just see how to make this more comprehensible using lists:

fun cons(h:'a)(l:'a list):'a list = h::l
 
fun enumerateInterval(low:int)(high:int) =
  if (low > high) 
    then nil 
  else cons low (enumerateInterval (low+1) high)
 
fun filter (f:'a->bool)(l:'a list): 'a list =
  case l of
    nil => nil
| h::t => if f(h) then (cons h (filter f t))
          else filter f t
 
fun sumOddSquares(n:int) =
  foldl (op +) 0
  (map sqr (filter odd (enumerateInterval 1 n)))

The list primitives we used are: cons, nil, hd, tl, null (some of them we used implicitly). You all know their contracts, i.e. hd(cons(a,x)) = a, etc. So, why use anything other than lists? Consider the question:

"What is the second prime between 42,000 and 42,000,000?"

hd(tl(filter prime (enumerateInterval 42000 42000000)
This is massively inefficient! We end up having to a list of 92,990,000 integers, check them ALL for primality, pick the second one! That's a pretty impressive waste of work.

How do we do better? We use a common and very powerful idea: BE LAZY! -- but be lazy in a particular way. More specifically, at selected points in the code, we deliver a promise to do something rather than actually doing it. Maybe nobody will actually collect on it! Then we don't have to do the work!

Here are the operations on streams (basically, the signature for streams):

nilStream 
nullStream s
consStream x s
hdStream s
tlStream s
This looks a lot like lists, but there is a critical difference. The difference between streams and lists is just this: Contrast this with a list. Same contract, but evaluation happens at different time,
  hdStream (consStream thing s) ==> thing  (* s not evaluated *)
  tlStream (consStream thing s) ==> s (* s is evaluated *)
The tail is a promise to evaluate the tail when asked to, not an actual object.

Implementing streams in ML

In a lazy language, this is really easy. In SML, we can do it by carefully using closures to delay their arguments.

exception Empty
 
datatype 'a stream = nilStream
                   | Cons of 'a * (unit -> 'a stream)
 
fun consStream (x: 'a) (y: unit -> 'a stream) = Cons(x, y)
 
fun nullStream (s: 'a stream): bool =
  case s of
    nilStream => true
  | _    => false
 
fun hdStream(s: 'a stream): 'a =
  case s of
    nilStream       => raise Empty
  | Cons(h, _) => h
 
fun tlStream(s: 'a stream): 'a stream =
  case s of
    nilStream       => raise Empty
  | Cons(_, t) => t()                   (* Force *)
 
fun mapStream (f: 'a -> 'b) (s: 'a stream): 'b stream =
  case s of
    nilStream  => nilStream
  | Cons(h, t) => (consStream (f(h)) (fn () => mapStream f (t()))) (* Delay *)
 
fun takeStream(s: 'a stream) (n: int): 'a list =
  case (s, n) of
    (_, 0)          => []
  | (nilStream, _)       => raise Empty
  | (Cons(h, t), n) => h :: (takeStream (t()) (n - 1))
 
fun filterStream (f: 'a -> bool) (s: 'a stream): 'a stream =
  case s of
    nilStream => nilStream
  | Cons(h, t) =>
      if f(h)
        then (consStream h  (fn () => filterStream f (t())))
      else filterStream f (t())
 
fun foldS (f:'a * 'b -> 'b) (base: 'b) (s: 'a stream):'b =
  case s of
    nilStream => base
  | Cons(h,t) => f(h,foldS f base (t())) (* Force *)
 
 
fun enumerateIntervalStream(low:int)(high:int) =
  if (low > high)
    then nilStream
  else consStream low
    (fn() => (print ""; enumerateIntervalStream (low+1) high))
 
 
fun sumOddSquaresS(n:int) =
  foldS (op +) 0
  (mapStream sqr (filterStream odd (enumerateIntervalStream 1 n)))

Some subtleties of streams

OK, let us now use streams for something we couldn't do before (with lists).

- val big = enumerateIntervalStream 1 10000000;
val big = Cons (1,fn) : int stream
- val bigodds = filterStream odd big;
val bigodds = Cons (1,fn) : int stream
- takeStream bigodds 5;
val it = [1,3,5,7,9] : int list
Let's think about
hdStream(tlStream(filterStream odd (enumerateIntervalStream 10 100000)))) (enumerateIntervalStream 10 100000) 
is
Cons(10,[promise to (enumerateIntervalStream 11 100000)]) 10 
isn't odd so filterStream will ask for the tail, thus forcing the promise. So now we are doing
filterStream odd (enumerateIntervalStream 11 100000)) 
which is
Cons(11,[promise to (enumerateIntervalStream 12 100000)])
Note for section: streams have an asymmetry, namely the head is always forced. So for example
filterStream (fn(x) (> x 1000000)) (enumerate-interval 1 10000000000))
runs for a long time.