CS212 Notes for Lecture 15 March 14

CS212 Notes for Lecture 15
March 14, 2000

Outline for the day:

Destructive list operations
Stacks and queues
Priority queues & heaps

Destructive list operations

Using set!, one can modify list structure. Recall that a list with 3 elements internally looks like this:

(define mylist (list 1 2 3))

     mylist: --------> o
                      / \
                     /   \
                    1     o
                         / \
                        /   \
                       2     o
                            / \
                           /   \
                          3    ()

The objects marked by "o" are just small blocks of memory containing two pointers, a head pointer and a tail pointer. These small blocks of memory are traditionally called cons cells.

One can change pointers with set! as follows. Suppose you wish to link a new element 4 into this list between 1 and 2. One can first create a list containing just 4:

(let ((new (list 4))) ...

         new: ------------> o
                           / \
                          /   \
                         4    ()

Now the tail pointer of this object, which currently points to the empty list, can be reset to point to where the tail pointer of the first cons cell of mylist points:

(set! (tail new) (tail mylist))

     new: ----------> o
                     / \
                    /   \
                   4     \
                          \
     mylist: --------> o  |
                      / \ |
                     /   \|
                    1     o
                         / \
                        /   \
                       2     o
                            / \
                           /   \
                          3    ()

Then the tail of mylist can be reset to point to new:

(set! (tail mylist) new)

     mylist: --------> o
                      / \
                     /   \
                    1     o <----- new
                         / \
                        /   \
                       4     o
                            / \
                           /   \
                          2     o
                               / \
                              /   \
                             3    ()

All together:

(define mylist (list 1 2 3))
(let ((new (list 4)))
  (set! (tail new) (tail mylist))
  (set! (tail mylist) new))

mylist ==> (1 4 2 3)

Note that when manipulating pointers, the order we do things is important! If we do the two set! statements in the opposite order, we get a circular list structure:

(define mylist (list 1 2 3))
(bind ((new (list 4)))
    (set! (tail mylist) new))
    (set! (tail new) (tail mylist))

mylist ==> (1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 ...)

Stacks and Queues

Data structures are the basic tools of the trade. Choosing the right structures (and abstractions around them) makes the difference between easy and hard.

A couple of the basic data structures of computer science that show up everywhere that you have probably seen:

stacks (last-in-first-out, LIFO)

queues (first-in-first-out, FIFO)

A stack is a data structure supporting the following operations:

  (make-stack) -------- Make a new empty stack
  (push thing stack) -- Return a new stack with thing on top of `stack'.
  (pop stack)  -------- Return a stack like `stack', but without its top element.
  (top stack) --------- Return the top element of the stack.
  (empty? stack) ------ Is there anything on the stack?

They obey the contract:

  (top (push thing stack)) = thing
  (pop (push thing stack)) = stack
  (empty? (make-stack)) = #t
  (empty? (push thing stack)) = #f

We can implement a stack with lists:

  push = pair
  pop = tail
  top = head
  empty? = null?

This implementation is quite efficient: the operations all take O(1) time, independent of the size of the stack.

A queue has the same operation names as a stack, but they do slightly different things. It's like a line of people at a cafeteria--you can join at the back end of the line, but you leave at the front end.

FIFO -- first in-first out (maintains order)

Nobody gets preempted
You come out in the same order you go in.

(make-queue) ---------- make a new empty queue
(insert thing queue) -- put thing at the tail end of the queue
(delete queue) -------- delete the head of the queue
(head queue) ---------- return the head of the queue
(empty? queue) -------- test emptiness

We could implement a queue as a list with the head (oldest element) at the front.

head = head
delete = tail
empty? = null?

but then to insert:

(method ((thing <object>) (q <list>)) (append q (list thing)))

which is O(n) time. We have to walk all the way down the list. This is expensive.

We could also do it backwards--the head at the last element--but that wouldn't help either: insert would be O(1), but delete and head would be O(n).

By being a little more clever, we can have both the head and the tail available in constant time.

Keep pointers to both ends of the list.

When you insert something, add it to the list
When you delete something, physically remove it from the list
The old queue is NO LONGER AVAILABLE

Here's how to implement a queue with constant time insertions and deletions.

Use a list structure with pointers to the beginning and end of the list. We'll read from the front of the list, and add to the rear.

(define <queue> <pair>)

(define (empty-queue? (q <queue>))
  (null? (head q)))

(define (make-queue) '(()))

(define (queue-head (q <queue>))
  (if (empty-queue? q)
      (error "Cannot take head of empty queue")
      (head (head q))))

(define (insert x (q <queue>))
  (let ((new (list x)))
    (if (empty-queue? q)
        (set! (head q) new)
        (set! (tail (tail q)) new))
    (set! (tail q) new))
    q)

(define (delete (q <queue>))
  (if (empty-queue? q)
      (error "Cannot delete from empty queue")
      (set! (head q) (tail (head q))))
  q)

Priority Queues

A priority queue is a data structure that maintains a collection of elements, each with a key, which is some value chosen from a totally ordered set (such as the natural numbers). A priority queue allows efficient insertion of a new element and extraction of the element with the minimum key. We will see an example of a priority queue in action in PS4. Thus

each element comes with a key or priority, an element of a totally ordered set (usually a number)
higher priority things come out first, even if they were added later
convention: smaller number = higher priority

Operations on a priority queue:

(make-prioq)      -- return empty structure
(insert! e pq)    -- put entry e in pq
(extract-min! pq) -- Remove highest priority entry in pq and return it

Priority Queues and Lists -- First Implementation

One possible implementation of a priority queue (by no means the only one) is a list in which the elements are ordered by increasing key value. To insert a new element, we find the appropriate place to insert it so as to maintain the order. The element with the minimum key is then always at the head of the list.

(defstruct <entry>
  (key <integer>)
  (data <top>))

(define <priority-queue> <list>)

We can make a new queue entry with key n and data d by calling

(make-entry n d)

Here are two ways to insert a new element, one which does not use destructive list operations, one which does.

(define (insert-1 (e <entry>) (q <priority-queue>))
  (cond ((null? q) (list e))
        ((< (entry-key e) (entry-key (head q))) (cons e q))
        (else (cons (head q) (insert-1 e (tail q))))))

As this recursive operation comes back out of the recursive calls, it allocates a new cons cell in the pair operation in the last line. The old cons cells are no longer in use and can be garbage-collected.

We can then insert a new entry e into a global priority queue *q* as follows:

(set! *q* (insert-1 e *q*))

Note that you cannot change the binding of *q* inside the insert-queue procedure if *q* is passed as a parameter, since (set! q ...) inside the procedure will change the local binding, not the global one.

Here is an alternative method that uses destructive list operations.

(define (insert-2 (e <entry>) (q <priority-queue>))
  ;first check if new element should go at head of list 
  (if (or (null? q) (< (entry-key e) (entry-key (head q))))
      (cons e q)
      ;if not, find element that it goes immediately after
      (letrec
        ((find-place (lambda ((q <priority-queue>))
           (if (or (null? (tail q)) (< (entry-key e) (entry-key (second q))))
               q
               (find-place (tail q))))))
        (let*
          ((pq (find-place q))
           (le (cons e (tail pq))))
          ;link new element in
          (set! (tail pq) le)
          ;return original list
          q))))

As above, to insert a new entry e into *q*, do

(set! *q* (insert-2 e *q*))

Version 2 has the minor advantage that it does not waste cons cells. This is the version you would use if programming in C, since C does not do garbage collection. However, since Scheme does garbage collection, this advantage is far outweighed by the simplicity of version 1.

Don't confuse a priority queue with its implementation as a list. A priority queue is just the data abstraction specified by its contract. There are many ways to implement it; the list implementation above is just one. We will see a different implementation using lists and a more efficient implementation using heaps below.

Analysis

insert: O(n) (bubble new element in to its rightful place in the sorted list)
extract-min: O(1) (just remove first element of list)

Priority Queues and Lists -- Second Implementation

do not bother to keep list sorted
just add new elements at head
search list to find minimum and extract it
insert: O(1)
extract-min: O(n)

Which is better? Not really any difference--one you win on inserts but lose on extract-min's, the other vice versa.

Side note: set! on parameters passed to functions doesn't always do what you want:

(define mylist '(1 2 3))
(define (alter1 (s <list>))
(set! s '(4 5 6)) s)

Then

(alter1 mylist) ==> (4 5 6)
mylist ==> (1 2 3)

mylist doesn't change, because you are setting the binding of the parameter s, not the argument mylist.

BUT:

(define (alter2 (l <list>))
  (set! (tail l) '(4 5 6)) l)
  (alter2 mylist) ==> (1 4 5 6)

mylist ==> (1 4 5 6)

Warning!

Mutating lists can make things fast, but they are dangerous.
But just because something uses mutators doesn't make it fast!
It might be dangerous and slow. You could give a non-mutator implementation of prioq's that looks basically the same with the same order running times, by copying the list structure (which is storage inefficient).

Priority Queues and Heaps

For list implementations above, either insert! or extract-min! is O(n).

This is more expensive than it needs to be. We can implement priority queues more efficiently with a heap: a tree containing data entries at the nodes such that

The key of each node is no larger than keys of its children
The node with minimum key is on top (root) of the tree (this property is actually a consequence of property 1).

Property 1 is called heap order. Heaps will give

insert: O(log n)
extract-min: O(log n)

Let's ignore data for a bit. Numbers shown are just keys (priorities)

     3
    / \
   /   \
  5     9
 / \   / \
12 6  10 15

Heaps are easily represented as vectors, or one-dimensional arrays. Like arrays in Java.

The root of the tree is at location 1 in the array and the children of the node stored array at position i are at locations 2i and 2i+1.

[3 5 9 12 6 10 15]

(Read across the tree, row by row.)

Heap order then translates to: for 1 <= i <= floor(n/2),

A[i] <= A[2i]
A[i] <= A[2i+1]

All operations on heaps will maintain heap order.

Crash course on Scheme vectors:

(make-vector k)         -- space for k things, indices 0 to k-1
(make-vector k e)       -- same, but initialize all entries to e
(vector e1 ...)         -- evaluate e1 ... and put them into a vector (analogous to list)
(vector-ref vec i)      -- get i'th element in O(1) time
(vector-set! vec i e)   -- put e at location i of vec
(vector-length vec)     -- length of vec

vector-ref, vector-set!, and vector-length require constant time, the other operations require linear time (time proportional to the length of the vector), not counting the time required to evaluate the arguments.

We'll make our heaps with a limited capacity telling how many elements the prioq can hold at once. The prioq will use only part of this vector at any time.

it's expensive to change the size of a vector

so we'll need one cell extra to tell how much of the vector is being used.

index 0 isn't being used for anything, so let's use that.

call it last-used

(define <prioq> <vector>)

(define (make-prioq (size <integer>))
  (make-vector (+ size 1) 0))

prioq-insert! takes some work:

Put new element at a leaf
Switch it with its parent if its parent is larger, and so on up the tree, to maintain heap order

       3
      / \
     /   \
    5     9    [3 5 9 12 6 10 15 4]
   / \   / \
  12  6 10 15
 /
4
        3
       / \
      /   \
     5     9    [3 5 9 4 6 10 15 12]
    / \   / \
   4   6 10 15
  /
12
        3
       / \
      /   \
     4     9    [3 4 9 5 6 10 15 12]
    / \   / \
   5   6 10 15
  /
12

This operation requires only O(log n) time -- the tree is depth ceil(log n), and we do a constant amount of work on each level.

Finding your parent is easy: If you're node i>1, then your parent is floor(i/2) = (quotient i 2)

So the code on the handout does the following:

Check for full queue

last-used = vector-length

Increment last-used
Store new element there
Bubble it up till (key parent) <= (key child)

extract-min! works by returning the element at the root.

Guaranteed to be the most important (smallest key) by the partial ordering property.
Now we have the two subtrees to put right, though.
Copy the last element to the root (first element)
If its key is larger than one of the children, bubble it down to maintain heap order
Always swap with the child with the smaller key (why?)

Here's what the code does (see handout):

Save minimum element, it's the return value
move last used element to first position,
decrement last element counter
bubble the new top down until its key is smaller than both children, or until it becomes a leaf

Example: Delete the top element from

        3
       / \
      /   \
     4     9    [3 4 9 5 6 10 15 12]
    / \   / \
   5   6 10  15
  /
12

This leaves two subheaps. Copy last element to root:

     12
    /  \
   /    \
  4      9    [12 4 9 5 6 10 15]
 / \    / \
5   6  10 15

Bubble it down:

     4
    / \
   /   \
 12     9    [4 12 9 5 6 10 15]
 / \   / \
5   6 10 15

      4
     / \
    /   \
   5     9    [4 5 9 12 6 10 15]
  / \   / \
12   6 10 15

Again an O(log n) operation because the tree is always balanced.

Heapsort

We can use a priority queue to sort n numbers:

Insert them in the queue, with the number as both priority and data
Then take them out in priority (= numerical) order.

With the first list implementation (sorted list), this would take

n insertions taking O(n) each, or O(n²) total
n deletions taking O(1) each, or O(n) total

With the second list implementation (unsorted list), this would take

n insertions taking O(1) each, or O(n) total
n deletions taking O(n) each, or O(n²) total

In each case the total is O(n²).

With the heap implementation,

n insertions taking O(log n) each, or O(n log n) total
n deletions taking O(log n) each, or O(n log n) total

Thus, O(n log n) total cost.

It's called heapsort and it's a reliable standard one.

If you have to sort by doing comparisons only, this is as fast as possible (up to a constant factor).

there are plenty of other O(n log n) algorithms with different properties
smaller constant factor
very fast if the list is already sorted

Today's concepts:

Mutable data

Stacks, Queues, O(1) insert/delete

Priority queue

insert
extract-min

Heaps

partially ordered tree
vector representation

Vectors in Scheme

Heapsort: O(n log n)