CS 312 Lecture 14
Mutable Data Structures

Last time we saw the use of refs for mutable data.  Today we are going to look at some common data structures using imperative style (mutable data), contrasting with the purely functional data structures we have seen thus far in the course.

First, however, a word from our sponsor...  bad uses of imperative language features.

Consider the following:

  let
   val result = ref 1
   val i = ref 0
   fun loop () =
      if (!i = n) then ()
      else (i := !i+1);
            result := !result * !i;
            loop())
   in (loop(); !result)
   end 
This is a bad use of assignment, as factorial is really a pure function that does not need storage or assignment. 

Imperative style programming should be used where local state is needed, such as in the simple example of an "account object" from last lecture.  It is also useful for abstract data types that can be implemented more efficiently, or can have simpler interfaces, than using a purely functional style.

In any event, however, remember to "package up" any mutable variables so that they are accessible only in the smallest possible scope to get the job done. 

Mutable Stacks

Let's consider a simple example of a mutable stack build using refs.  This is an example of a data structure where the functional form can be as efficient and as useful as the mutable form, and is simply an illustration of some of the differences that are visible at the level of the signature.  That is, often a mutable data type has a different interface not just a different implementation.  Recall that for functional stacks, any implementation must return an "updated stack" whenever push or pop are called.  Thus in the signature these functions always returned a stack.  Pop, returned a pair of values so that it could return both the value being popped and the resulting stack.  With a mutable stack we can refer to an object that changes, and thus do not need to return the object.  Push only has an effect, pop has an effect and a value.

A useful way of thinking about operations on mutable data is to divide them into three categories:

Creators - which produce new mutable objects (effect)
Observers - which return information about the contents of a mutable object (value)
Mutators - which change the contents of a mutable object (effect)

In mutable stacks, singleton is a creator, push is a mutator, and pop is both a mutator and an observer.

    signature MUTABLE_STACK = 
      sig
	  (* An 'a mstack is a mutable stack of 'a elements *)
         type 'a mstack
         (* singleton(x) is a new stack with one item, x *)
         val singleton : 'a -> 'a mstack
         (* Effects: push(m,x) pushes x onto m *)
         val push : 'a mstack * 'a -> unit
         (* pop(m) is the head of m.
          * Effects: pops the head off the stack. *)
         val pop : 'a mstack -> 'a option
      end

    structure Mutable_Stack :> MUTABLE_STACK =
      struct
         (* A mutable stack is a reference
          * to the list of values, with the top
          * of the stack at the head. *)
         type 'a mstack = ('a list) ref
         fun singleton(x:'a):'a mstack = ref([x])
         fun push(s:'a mstack, x:'a):unit = 
             s := x::(!s)
         fun pop(s:'a stack):'a option = 
             case (!s) of
               [] => NONE
             | hd::tl => (s := tl; SOME(hd))
       end

Looking at the implementation, we see that we don't really get any advantage compared to the functional form.  In fact, the it is quite similar to the functional form, simply with the additional dereferencing.

Note the minor awkward issue that the creator is singleton rather than the empty stack.  This is because we need to know what type of items a stack object will contain.

Priority Queues

Priority queues are a kind of queue in which the elements are dequeued in priority order.

Priority queues are broadly useful, for schedulers in operating systems and real-time games, for event-based simulators (with priority = simulated time), searching, routing, compression via Huffman coding, ...

signature IMP_PRIOQ =
  sig
    (* A 'a prioq is a mutable priority queue.
     * Abstractly, it is a possibly empty sequence of
     * elements [a1,...,an] sorted in priority order.
     * The operations destructively update the data
     * structure. *)
    type 'a prioq

    (* Error raised when attempting to extract from an
     * empty prioq *)
    exception EmptyQueue

    (* Create a new, empty priority queue *)
    val create : ('a * 'a -> order) -> 'a prioq

    (* insert(q,a) inserts a into q in priority order. *)
    val insert : 'a prioq -> 'a -> unit

    (* extract_min(q) removes and returns the first
     * element in the queue. Checks that the
     * queue is nonempty, raises EmptyQueue. *)
    val extract_min : 'a prioq -> 'a

    (* empty(p) is true iff p has no elements *) 
    val empty : 'a prioq -> bool
  end

We can see from this signature, that there is a creator called create, a mutator called insert, a combination mutator/observer called extract_min and an observer called empty.

Note that create takes a comparator function as an argument, which defines the ordering of the elements. Thus we have chosen a design where the elements themselves define their priorities.  An alternative design is to use explicit priorities and store pairs of priority and value. (Note the comparator in turn constrains the type of values that can be stored in a priority queue object, because they must be handled by the comparator function).

Even without any implementation of this signature, we can see that it is useful for common operations such as sorting.  To sort a list of numbers, we can simply insert each one into a priority queue, and then repeatedly extract the minimum element.  The running time of this sorting algorithm depends on the running time of our priority queue implementation, but is O(n(i+e)) where n is the number of elements in the list, i is the insert time and e is the extract time.  Good implementations of priority queues have O(log n) insert and extract times, yielding an O(nlogn) sorting algorithm.

Here is a simple implementation of mutable priority queues using lists:

structure ListPrioq : IMP_PRIOQ =
  (* Represents the priority queue as a list ordered by key,
   * and min element at head. *)
  struct
    type 'a prioq = {compare: 'a * 'a -> order,
                     elements: 'a list ref}

    fun create (c:'a*'a->order) = {compare=c, elements=ref []}
    fun empty({compare, elements}: 'a prioq) = null(!elements)
    fun insert ({compare,elements}: 'a prioq) (x:'a): unit =
      let fun ins [] = [x]
            | ins (hd::tl) =
                (case compare(hd,x) of
                   LESS => hd::(ins tl) 
                 | _ => x::(hd::tl))
      in
        elements := ins(!elements)
      end
    exception EmptyQueue
    fun extract_min ({compare,elements}:'a prioq):'a =
      case (!elements) of
        [] => raise EmptyQueue
      | hd::tl => (elements := tl; hd)
  end

This implementation uses a record to store the comparison function and a ref to a list of integers.  Thus, like the mutable stack above, mutability is not being used for much in this implementation.  How might one do better still using lists?  This is a good exercise.

The main work is in the insert function, which goes down the list until the first element is no longer less than the one to be inserted, then it tacks the new element onto the remainder of the list and

What is the asymptotic running time of the operations for this implementation?  O(n) insert, O(1) extract. What else could you use that would have better asymptotic time?  Red-black trees, O(logn) insert and extract.

Red black trees are somewhat overkill for implementing priority queues, the asymptotic running time is good but the constant factors are not that great.  Priority queues are often used where every bit of time matters - both in an asymptotic sense and in terms of low constant factors.  The preferred implementation of priority queues is using binary heaps.  This leads us to arrays, which are the other basic mutable data type other than refs in SML.  Arrays are a sequence of locations that are mutable, rather than a single location.  One can think of a ref as an array of size 1.

Binary Heaps

A binary heap (often just referred to as a heap) is a special kind of balanced binary tree.  The tree satisfies two invariants:

Suppose the priorities are just numbers. Here is a possible heap:

              3
             / \
            /   \
           5     9
          / \   /
         12  6 10

Obviously we can find the minimum element in O(1) time. Extracting it while maintaining the heap invariant will take O(lg n) time. Inserting a new element and establishing the heap invariant will also take O(lg n) time. So asymptotic performance is the same as for red-black trees but constant factors are better for heaps.

The key observation is that we can represent a heaps as an array

The root of the tree is at location 0 in the array and the children of the node stored at position i are at locations 2i+1 and 2i+2. This means that the array corresponding to the tree contains all the elements of tree, read across row by row. The representation of the tree above is:

[3 5 9 12 6 10]

Given an element at index i, we can compute where the children are stored, and conversely we can go from a child at index j to its parent at index floor((j-1)/2).

The rep invariant for heaps in this representation is actually simpler than when in tree form:

Rep invariant for heap a (the partial ordering property):

a[i] ≤ a[2i+1] and a[i] ≤ a[2i+2]
for 1 ≤ i ≤ floor((n-1)/2)

Now let's see how to implement the priority queue operations using heaps: 
 

insert

  1. Put the element at first missing leaf. (Extend array by one element.)
  2. Switch it with its parent if its parent is larger: "bubble up"
  3. Repeat #2 as necessary.
     

Example: inserting 4 into previous tree.

              3
             / \
            /   \
           5     9        [3 5 9 12 6 10 4]
          / \   / \
         12  6 10  4

              3
             / \
            /   \
           5     4        [3 5 4 12 6 10 9]
          / \   / \
         12  6 10  9

This operation requires only O(lg n) time -- the tree is depth
ceil(lg n) , and we do a bounded amount of work on each level.

extract_min

extract_min works by returning the element at the root.

The trick is this:

Original heap to delete top element from (leaves two subheaps)

              3
             / \
            /   \
           5     4        [3 5 4 12 6 10 9]
          / \   / \
         12  6 10  9

copy last leaf to root

              9
             / \
            /   \
           5     4        [9 5 4 12 6 10]
          / \   /
         12  6 10

"push down"

              4
             / \
            /   \
           5     9        [9 5 4 12 6 10]
          / \   /
         12  6 10


Again an O(lg n) operation.

The following code implements priority queues as binary heaps, using SML arrays:

structure Heap : IMP_PRIOQ =
  struct
    type 'a heap = {compare : 'a*'a->order,
                    next_avail: int ref,
                    values : 'a option Array.array
                    }
    type 'a prioq = 'a heap

(* We embed a binary tree in the array 'values', where the
 * left child of value i is at position 2*i+1 and the right
 * child of value i is at position 2*i+2.
 *
 * Invariants:
 *
 * (1) !next_avail is the next available position in the array
 * of values.
 * (2) values[i] is SOME(v) (i.e., not NONE) for 0<=iorder) *)

(* get_elt(p) is the pth element of a. Checks
 * that the value there is not NONE. *)
fun get_elt(values:'a option Array.array, p:int):'a =
  valOf(Array.sub(values,p))

val max_size = 500000
fun create(cmp: 'a*'a -> order):'a heap =
  {compare = cmp,
   next_avail = ref 0,
   values = Array.array(max_size,NONE)}
fun empty({compare,next_avail,values}:'a heap) = (!next_avail) = 0

exception FullHeap
exception InternalError
exception EmptyQueue

fun parent(n) = (n-1) div 2
fun left_child(n) = 2*n + 1
fun right_child(n) = 2*n + 2

(* Insert a new element "me" in the heap.  We do so by placing me
 * at a "leaf" (i.e., the first available slot) and then to
 * maintain the invariants, bubble me up until I'm <= all of my
 * parent(s).  If there's no room left in the heap, then we raise
 * the exception FullHeap.
 *)
fun insert({compare,next_avail,values}:'a heap) (me:'a): unit =
  if (!next_avail) >= Array.length(values) then
    raise FullHeap
  else
    let fun bubble_up(my_pos:int):unit =
      (* no parent if position is 0 -- we're done *)
      if my_pos = 0 then ()
      else
        let (* else get the parent *)
          val parent_pos = parent(my_pos);
          val parent = get_elt(values, parent_pos)
        in
          (* compare my parent to me *)
          case compare(parent, me) of
            GREATER =>
              (* swap if me <= parent and continue *)
              (Array.update(values,my_pos,SOME parent);
               Array.update(values,parent_pos,SOME me);
               bubble_up(parent_pos))
          | _ => () (* otherwise we're done *)
        end
        (* start off at the next available position *)
        val my_pos = !next_avail
    in
      next_avail := my_pos + 1;
      Array.update(values,my_pos,SOME me);
      (* and then bubble me up *)
      bubble_up(my_pos)
    end

exception EmptyQueue
(* Remove the least element in the heap and return it, raising
 * the exception EmptyQueue if the heap is empty.  To maintain
 * the invariants, we move a leaf to the root and then start
 * pushing it down, swapping with the lesser of its children.
 *)
fun extract_min({compare,next_avail,values}:'a heap):'a =
  if (!next_avail) = 0 then raise EmptyQueue
  else (* first element in values is always the least *)
    let val result = get_elt(values,0)
      (* get the last element so that we can put it at position 0 *)
      val last_index = (!next_avail) - 1
      val last_elt = get_elt(values, last_index)
      (* min_child(p) is (c,v) where c is the child of p at which
       * the minimum element is stored), and v is the value
       * at that position. Requires p has a child. *)
      fun min_child(my_pos): int*'a =
        let
          val left_pos = left_child(my_pos)
          val right_pos = right_child(my_pos)
          val left_val = get_elt(values, left_pos)
        in
          if right_pos >= last_index then (left_pos, left_val)
          else
            let val right_val = get_elt(values, right_pos) in
              case compare(left_val, right_val)
                of GREATER => (right_pos, right_val)
                 | _ => (left_pos, left_val)
            end
        end
      (* Push "me" down until I'm no longer greater than my
       * children. When swapping with a child, choose the
       * smaller of the two.
       * Requires: get_elt(values, my_pos) = my_val
       *)
      fun bubble_down(my_pos:int, my_val: 'a):unit =
        if left_child(my_pos) >= last_index then () (* done *)
        else let val (swap_pos, swap_val) = min_child(my_pos) in
          case compare(my_val, swap_val)
            of GREATER =>
              (Array.update(values,my_pos,SOME swap_val);
               Array.update(values,swap_pos,SOME my_val);
               bubble_down(swap_pos, my_val))
             | _ => () (* no swap needed *)
        end
    in
      Array.update(values,0,SOME last_elt);
      Array.update(values,last_index,NONE);
      next_avail := last_index;
      bubble_down(0, last_elt);
      result
    end
  end