CS312 Lecture 22: Side Effects (Continued). Streams.

Prelim 2

The prelim is coming up. Everything we have covered in class, sections, and homeworks is fair game. Material that was covered by the first prelim will not be emphasized, but you should have a good understand of it, as the rest of the material depends on it. You will not be asked for an explicit substitution model application, for example, but you will need to use your implicit understanding of the substitution model to solve problems. Also, you will not be required to produce an induction proof, but you might need to use some kind of induction argument when arguing about the complexity of an algorithm.

In particular, the material for the exam includes the following:

We will have a review session on Monday, in section.

References (Continued)

From our earlier discussions you know that we can only define recursive functions using fun. A simple val can declare only regular (non-recursive) functions. (Recursive functions can in fact be declared using the val rec construct, but we will ignore this feature.) Rather, we'll illustrate how references can be used to circumvent the limitations of val:

- val x: (int -> int) ref = ref (fn _: int => 1);
val x = ref fn : (int -> int) ref
- val f = (fn (n:int):int => if n = 0 then 1 else n * (!x)(n - 1));
val f = fn : int -> int
- f(5);
val it = 5 : int
- x := f;
val it = () : unit
- f(5);
val it = 120 : int

Let us now assume that we want to set up a linear list usind the data type definition given below:

datatype 'a mylist = Nil | Cons of 'a * 'a mylist ref;

In such a datastructure one could manipulate the references that link list elements, so that a loop is formed. The need for such loops arises naturally in certain applications. Given an 'a mylist, how can we detect that the data structure contains a loop?

We could, for example, start with the head of the list and follow the links to get to the next cell. Each cell that is visited could be tagged. If we reach a Nil cell, we know for sure that the list does not contain a loop. If - following a link - we visit a cell that has already been tagged, then the list must contain a loop. Sooner of later one of these outcomes will occur.

While tagging allows us to detect loops at the earliest possible time, it has the drawback of consuming additional memory proportional to the length of the list. In the following we will focus on an alternative approach that might perform more steps (i.e. follows more links) before it recognizes a loop, but it does not need additional memory for storing tag information (assuming that the compiler/interpreter transforms tail-recursive functions into loops).

The idea is to move two references along the list; the first one will make one step at a time, while the second one will make two. If the data structure contains a loop, sooner or later both references will enter it. Once this happens, the "faster" reference will - in time - overtake the slower one. If a loop does not exist, then the (faster) reference will find the end of the list (a Nil cell).

fun tl(l: 'a mylist): 'a mylist ref = 
  case l of
    Nil        => raise Empty
  | Cons(_, t) => t

fun hascycle (l: 'a mylist): bool =
let
  fun move(one: 'a mylist ref, two: 'a mylist ref): bool = 
  let
    val one  = tl(!one)
    val two' = tl(!two)
    val two  = tl(!two')
  in
    if (one = two) orelse (one = two') then true
                                       else move(one, two)
  end
in
  move (ref l, tl l)
  handle Empty => false
end


- val dummy = ref (Nil: int mylist);
val dummy = ref Nil : int mylist ref
- hascycle(!dummy);
val it = false : bool


- val v = Cons(5, dummy);
val v = Cons (5,ref Nil) : int mylist
- dummy:=v;
val it = () : unit
- hascycle(v);
val it = true : bool


- val dummy = ref (Nil: int mylist);
val dummy = ref Nil : int mylist ref
- val dummy2 = ref (Cons(2, ref (Cons(3, ref (Cons(4, ref (Cons(5, dummy))))))));
val dummy2 = ref (Cons (2,ref (Cons #))) : int mylist ref
- val v2 = Cons(0, ref (Cons(1, dummy2)));
val v2 = Cons (0,ref (Cons (#,#))) : int mylist
- hascycle(v2);
val it = false : bool
- dummy := !dummy2;
val it = () : unit
- hascycle(v2);
val it = true : bool

Streams

We have discussed streams in the context of lazy/eager evaluators. As you will recall, a stream is a - possibly infinite - sequence of values. Because of its potential infinite length, it does not make sense to attempt to compute all the values in a stream. Instead of storing all their values, our streams store the first value, together with a method of computing (generating) the rest of the stream.

Streams can model a large class of objects important in applications. These could abstract entitities, like the set of natural numbers, but also concrete ones, like the sequence of transactions at a bank's ATM machines.

We have argued than that streams are easy to implement in a lazy language. By cleverly exploiting the fact that anonymous function declarations in SML are lazy we can implement full-fledged streams in an eager language as well. Here is only such implementation:

exception Empty

datatype 'a stream = Null
                   | Cons of 'a * (unit -> 'a stream)

fun cons (x: 'a) (y: unit -> 'a stream) = Cons(x, y)

fun isEmpty(s: 'a stream): bool =
  case s of
    Null => true
  | _    => false

fun hd(s: 'a stream): 'a =
  case s of 
    Null       => raise Empty
  | Cons(h, _) => h

fun tl(s: 'a stream): 'a stream =
  case s of
    Null       => raise Empty
  | Cons(_, t) => t()

fun map (f: 'a -> 'b) (s: 'a stream): 'b stream =
  case s of
    Null  => Null
  | Cons(h, t) => Cons(f(h), (fn () => map f (t())))

fun take(s: 'a stream) (n: int): 'a list =
  case (s, n) of
    (_, 0)          => []
  | (Null, _)       => raise Empty
  | (Cons(h, t), n) => h :: (take (t()) (n - 1))

fun filter (f: 'a -> bool) (s: 'a stream): 'a stream =
  case s of
    Null => Null
  | Cons(h, t) => if f(h) then Cons(h, fn () => filter f (t()))
                          else filter f (t())

To avoid shadowing the usual definitions for functions like hd and tl we should encapsulate the definition of streams into a structure.

We are now ready to define a series of simple streams:

fun const(c: 'a) = cons c (fn() => const(c));
fun nats(n: int) = cons n (fn () => nats(n + 1));
fun fibo(a: int, b: int) = cons a (fn () => fibo(b, a + b));

Expression const("repeat") generates a stream consisting of the infinite repetition of string "repeat." Ignoring issues related to integer overflow nats(0) generates the stream of natural numbers. In turn fibo(0, 1) generates the stream of Fibonacci numbers.

One of the simplest and most efficient method of generating the list of all prime numbers is the algorithm known as the sieve of Erathostenes. Here is a stream that uses this idea to generate the stream of all prime numbers.

fun sift (p: int) (s: int stream): int stream =
  filter (fn n => n mod p <> 0) s

fun sieve (s: int stream): int stream =
  case s of
    Null       => Null
  | Cons(s, t) => Cons(s, fn () => sieve(sift s (t())))

val primes = sieve(nats(2));

Now we can, for example, generate the list of the first 10 prime numbers:

- take primes 10;
val it = [2,3,5,7,11,13,17,19,23,29] : int list

A stream need not be infinite. Given a list of values, let's make it into a stream:

fun fromList(l: 'a list): 'a stream =
  case l of
    []   => Null
  | h::t => Cons(h, fn () => fromList t)

- take (fromList [9, 7, 5, 3, 1]) 4;
val it = [9,7,5,3] : int list
- take (fromList [9, 7, 5, 3, 1]) 10;

uncaught exception Empty
  raised at: stdIn:140.30-140.35

A finite list of values can be made into an infinite stream by circularly reusing the given elements:

fun circular(l: 'a list): 'a stream =
  case l of 
    []   => Null
  | h::t => Cons(h, fn () => circular (t @ [h]))

- take (circular [1, 2, 3]) 10;
val it = [1,2,3,1,2,3,1,2,3,1] : int list

None of the streams that we defined until now "have memory." Take a look at the code below:

- val v = nats(0);
val v = Cons (0,fn) : int stream
- take v 10;
val it = [0,1,2,3,4,5,6,7,8,9] : int list
- take v 10;
val it = [0,1,2,3,4,5,6,7,8,9] : int list

It does not matter that we took out 10 values out of v, the next time we request 10 elements, we get the same set. The stream has no recollection of the values it provided. We could fix this by defining a drop function that would return a stream with a given number of elements removed. Then an expression like take (drop v 10) 10 would return the list [10,11,12,13,14,15,16,17,18,19]. Try to write drop yourselves!

Alternatively, we can use side effects to "remember" the values that the stream already provided. Try to think how you would implement such a stream.


CS312  © 2002 Cornell University Computer Science