Lecture 16: Streams

Administrivia

Your solutions to PS4 only need to handle the same cases that optimize0 handles (no more). This will make your life much easier. More generally, be sure to read the new PS4 FAQ on the web.

For inlining, you should inline all functions in your code. You can assume that this optimization will only be called when inlining is semantically sound. Note that inlining essentially enforces lazy semantics, unless you explicitly introduce a let for the arguments.

Streams

We’re now moving towards introducing side effects into SML (actually, SML has always had them, we just never let you use them before…)

One of the reasons that functional programs are so easy to use is that they have no notion of time. An expression has a value, and this is an unchanging truth. Programs with side effects are much harder to think about than functional programs, because the value of an expression changes over time. This is referred to in computer science as state, and is a very important concept.

However, before we get there, we will look at an intermediate way to think about computation over time, namely streams. Streams are closely related to lazy evaluation, and give us a nice bridge to side effects.

The view of programming we will take is rather like signal processing. Think about a stereo system, and the flow of information through it:

Watch the signal / data flow through a processor/program.


 


>>> Draw some curves at each of the arrows, some notes at the sound

 

We're going to do something similar, having information flowing through a collection of boxes.

We'll look at primitives (kinds of boxes)

·       enumerate [generate a signal]

·       map [turn one signal into another]

·       filter [remove some signal values]

·       fold [turn a signal into a scalar]

Idea:

·       Stereos and sound-systems are great because they come in lots of boxes and you can hook them together in many many useful ways.

·       We'll try to do the same thing with boxes for streams.

We've looked at some of these (e.g., map, fold) for lists.  Streams will be similar.

·       Initially, they'll just *be* lists

·       But next time we'll let them be infinitely long.

-         We'll have a stream of all the integers.

-         That couldn't possibly fit as a list.

To begin with, let’s look at some examples just involving lists.

Consider the problem of summing the odd integers between 1 and N.

fun sumOddSquares(n:int) =
  let fun next(k:int) =
      if k > n then 0 else
        (if odd k then sqr(k) + next (k+1) else next(k+1))
  in
    next 1
  end

There are four things going on here:

  1. ENUMERATE      the numbers 1 ... n

  2. FILTER        out the prime numbers from that list

  3. MAP             square on each of the selected numbers

  4. FOLD          the result, using +, starting from 0.

>>> Draw the 4 boxes, taking n as the initial input

This pattern is pretty hard to see from the code, though:

· Everything's going on at once.

We're going to use STREAMS to capture this picture.

Initially, let’s just see how to make this more comprehensible using lists:

fun cons(h:'a)(l:'a list):'a list = h::l
 
fun enumerateInterval(low:int)(high:int) =
  if (low > high) 
    then nil 
  else cons low (enumerateInterval (low+1) high)
 
fun filter (f:'a->bool)(l:'a list): 'a list =
  case l of
    nil => nil
| h::t => if f(h) then (cons h (filter f t))
          else filter f t
 
fun sumOddSquares(n:int) =
  foldl (op +) 0
  (map sqr (filter odd (enumerateInterval 1 n)))

 

The list primitives we used are: cons, nil, hd, tl, null (some of them we used implicitly). You all know their contracts, i.e. hd(cons(a,x)) = a, etc.

So, why use anything other than lists? Consider the question:

  "What is the second prime between 42,000 and 42,000,000?"

hd(tl(filter prime (enumerateInterval 42000 42000000)

This is massively inefficient!

We end up having to

  1. create a list of 92,990,000 integers,

  2. Check them ALL for primality,

  3. Pick the second one!

That's a pretty impressive waste of work.

How do we do better?

We use a common and very powerful idea:

  BE LAZY! -- but be lazy in a particular way

Specifically,

 At selected points in the code, we deliver a promise to do something  rather than actually doing it.

Maybe nobody will actually collect on it!  Then we don't have to do the work!

Here are the operations on streams (basically, the signature for streams):

nilStream 
nullStream s
consStream x s
hdStream s
tlStream s

This looks a lot like lists, but there is a critical difference.

The difference between streams and lists is just this:

· With a stream, the tail is *not* evaluated when you *MAKE* the stream

· It is only evaluated when you *USE* it.

- tl evaluates (forces) the tail

- cons-stream doesn't

Contrast this with a list.  Same contract, but evaluation happens at different time,

  hdStream (consStream thing s) ==> thing  (* s not evaluated *)
  tlStream (consStream thing s) ==> s (* s is evaluated *)

The tail is a promise to evaluate the tail when asked to, not an actual object.

Implementing streams in ML

In a lazy language, this is really easy. In SML, we can do it by carefully using closures to delay their arguments.

exception Empty
 
datatype 'a stream = nilStream
                   | Cons of 'a * (unit -> 'a stream)
 
fun consStream (x: 'a) (y: unit -> 'a stream) = Cons(x, y)
 
fun nullStream (s: 'a stream): bool =
  case s of
    nilStream => true
  | _    => false
 
fun hdStream(s: 'a stream): 'a =
  case s of
    nilStream       => raise Empty
  | Cons(h, _) => h
 
fun tlStream(s: 'a stream): 'a stream =
  case s of
    nilStream       => raise Empty
  | Cons(_, t) => t()                   (* Force *)
 
fun mapStream (f: 'a -> 'b) (s: 'a stream): 'b stream =
  case s of
    nilStream  => nilStream
  | Cons(h, t) => (consStream (f(h)) (fn () => mapStream f (t()))) (* Delay *)
 
fun takeStream(s: 'a stream) (n: int): 'a list =
  case (s, n) of
    (_, 0)          => []
  | (nilStream, _)       => raise Empty
  | (Cons(h, t), n) => h :: (takeStream (t()) (n - 1))
 
fun filterStream (f: 'a -> bool) (s: 'a stream): 'a stream =
  case s of
    nilStream => nilStream
  | Cons(h, t) =>
      if f(h)
        then (consStream h  (fn () => filterStream f (t())))
      else filterStream f (t())
 
fun foldS (f:'a * 'b -> 'b) (base: 'b) (s: 'a stream):'b =
  case s of
    nilStream => base
  | Cons(h,t) => f(h,foldS f base (t())) (* Force *)
 
 
fun enumerateIntervalStream(low:int)(high:int) =
  if (low > high)
    then nilStream
  else consStream low
    (fn() => (print "<force>"; enumerateIntervalStream (low+1) high))
 
 
fun sumOddSquaresS(n:int) =
  foldS (op +) 0
  (mapStream sqr (filterStream odd (enumerateIntervalStream 1 n)))
 

Some subtleties of streams

OK, let us now use streams for something we couldn’t do before (with lists).

- val big = enumerateIntervalStream 1 10000000;
val big = Cons (1,fn) : int stream
- val bigodds = filterStream odd big;
val bigodds = Cons (1,fn) : int stream
- takeStream bigodds 5;
<force><force><force><force><force><force><force><force><force><force>val it = [1,3,5,7,9] : int list
 

Let’s think about

hdStream(tlStream(filterStream odd (enumerateIntervalStream 10 100000))))
(enumerateIntervalStream 10 100000) 

is

Cons(10,[promise to (enumerateIntervalStream 11 100000)])

10 isn’t odd so filterStream will ask for the tail, thus forcing the promise. So now we are doing

filterStream odd (enumerateIntervalStream 11 100000)) 

which is

Cons(11,[promise to (enumerateIntervalStream 12 100000)])

Note for section: streams have an asymmetry, namely the head is always forced.  So for example

filterStream (fn(x) (> x 1000000)) (enumerate-interval 1 10000000000))

runs for a long time.