DATA ABSTRACTION:

This is probably the most important single programming technique you'll learn. Ever.

So far we've used only built-in primitive types of objects:

But suppose you want some other data structure? e.g., stack or queue


What's important about a type?
* There are some operations on it which do the right thing. That is, there's a specification or contract (API) about how the type behaves.
* Anything meeting that contract is OK as an implementation of the data type

We have already seen the concepts of abstraction and specification for procedures:
- e.g, different multiplication procedures times-1, times-2, fast-times, etc meet the same contract (have the same INPUT/OUTPUT behavior):

a ---> +---------+
           | TIMES  |----> ab
b ---> +---------+

That's WHAT they do.

HOW they do it is totally different, but that doesn't matter as long as they meet the specification.

Contract/Specification = WHAT the program does
* Black box description
Implementation = HOW the program does it


We're going to do the same for data:
* Give a specification
* Hide the implementation.

This gives us two BIG advantages:
* We can think about the data clearly.
* We can change the implementation if we ever need to.

This is a real win:
* We can throw together a nice simple (and inefficient) implementation of a datatype
- Fast programming
- Get the rest of the program working
- Find out where the slow spots are
* When we need to, we can replace it with a more complicated but faster one.

It's called an ABSTRACTION BARRIER:
* A few things are visible outside
- You (and others) can use them freely.
* The rest is hidden
- Nobody depends on it (just the external stuff) so you can change it freely.

Note: many Win32 miseries (and general compatibility problems) are due to people depending upon internals!!


We're going to start with a simple abstract data type, rational numbers:

1/2 + 3/4 = 5/4
2/3 * 3/4 = 1/2

Note: 5/4 is NOT the same as 1.25
* Different types (intuitively).
* 1/3 is very different from 0.33333333.
- Multiply by 3:
(* 1/3 3) is 1,
(* 0.33333333 3) is 0.9999999, which isn't quite 1.

The rules for adding and multiplying rationals are familiar:

>>> Keep these on the board <<<

a/x + b/y = (ay + bx)/xy
a/x * b/y = ab / xy

We will define an abstract data type called <rat> which represents rational numbers and supports some operations and tests.

CONSTRUCTOR
(make-rat n d) given n,d <integer>s, d not = 0, returns a <rat>
ACCESSORS
(numer r) takes a <rat>, returns an <integer>
(denom r) takes a <rat>, returns an <integer>

with the following specification:

(numer (make-rat n d)) n
---------------------- = ---
(denom (make-rat n d)) d

with the usual rule for equality of rational numbers:

n1/d1 = n2/d2 if n1*d2 = n2*d1.

Note the specification does NOT say (numer (make-rat n d)) = n or (denom (make-rat n d)) = d.

What operations and tests do we typically want to do with rational numbers?

ADDITION (rat-add r1 r2), given two <rat>s returns a <rat>
MULTIPLICATION (rat-mul r1 r2), given two <rat>s returns a <rat>
EQUALITY TEST (rat-eq r1 r2), given two <rat>s returns a boolean
INEQUALITY TEST (rat-leq r1 r2), given two <rat>s returns a boolean

with specifications

(rat-eq (make-rat n1 d1) (make-rat n2 d2)) => #t
if the rational numbers n1/d1 and n2/d2 are equal, that is,
if n1*d2=n2*d1, => #f otherwise

(rat-leq (make-rat n1 d1) (make-rat n2 d2)) => #t
if n1/d1 <= n2/d2 as rational numbers, => #f otherwise

(rat-eq (rat-add (make-rat n1 d1) (make-rat n2 d2))
           (make-rat n3 d3)) => #t if n1/d1 + n2/d2 = n3/d3 as rational numbers (Note: this does NOT say that d3 = d1*d2 and n3 = n1*d2 + n2*d1 !),
=> #f otherwise

(rat-eq (rat-mul (make-rat n1 d1)
(make-rat n2 d2))
(make-rat n3 d3))
=> #t if n1/d1 * n2/d2 = n3/d3 as rational numbers,
=> #f otherwise

This specification gives us some flexibility in the implementation. We'll see a few different implementations. But we don't need to know the implementation to work with the data type <rat>--it's enough to know the specification.

It makes perfectly good sense to write a SPECIFICATION or CONTRACT that you don't know how to implement.
* Get used to it,

* We'll do it repeatedly
* And eventually you'll write large programs using that method.

Let's get back to earth and actually implement the abstract data type.


We'll implement <rat>s using cons cells (pairs) which in turn we can implement using cons cells, lambdas, etc.
* Basically an ordered pair a la mathematics.

CONSTRUCTOR: cons
ACCESSORS: head, tail a.k.a. car, cdr

The specification is:

(head (cons v1 v2)) => v1
(tail (cons v1 v2)) => v2
----------------------------------------------------------------------
We can represent rationals as pairs of integers.

(define (make-rat n d)
  (if (and (number? n) (number? d) (not (= d 0)))
      (cons n d)
      (error "make-rat expects numbers with denom not zero")))

The error function here terminates the program because of some exception situation -- in this case we called make-rat with 0.
(In practice, we should really check that n and d are numbers.)

You could use cons directly, i.e.

(define make-rat cons)

but this has serious disadvantages, as we'll see.

Similarly, can define numerator and denominator:

(define (numer r) (head r))
(define (denom r) (tail r))

or just

(define numer head)
(define denom tail)

Its easy to see that this meets the spec:

(numer (make-rat x y)) =>
(numer (if (and (number? x) (number? y) (not (= d 0))) (cons x y) (error ..)))
=>
(numer (if (and #t (number? y) (not (= d 0))) (cons x y) (error ..))) =>
(numer (if (and #t #t (not (= d 0))) (cons x y) (error ..))) =>
(numer (if (and #t #t #t) (cons x y) (error ..))) =>
(numer (if #t (cons x y) (error ..))) =>
(numer (cons x y)) =>
(first (cons x y)) =>
(head (cons x y)) =>
x

Similarly, (denom (make-rat x y)) evaluates to y.

So,

(numer (make-rat x y))    x
---------------------- =      ---
(denom (make-rat x y))   y

as the specification demanded.

Implementing things this way, our rationals are actually of type cons-cell rather than of their own type, <rat>. In general it is better to use DEFSTRUCT when defining abstract data types (we'll cover this soon). In that way there is a distinct type. That way, we can check that when performing numer or denom, the thing that we're passing in is a <rat> instead of any old pair. Then we can assume that since make-rat is the only way to make a <rat> that the elements are numbers and the denominator is non-zero. But some languages don't support it. Scheme does, and we'll cover this later.


Here's how we might implement the arithmetic operations and tests.

(define (rat-add r1 r2)
  (let ((n1 (numer r1))
	(d1 (denom r1))
	(n2 (numer r2))
	(d2 (denom r2)))
    (make-rat (+ (* n1 d2)
		 (* n2 d1))
	      (* d1 d2))))

 

(define (rat-mul r1 r2)
  (let ((n1 (numer r1))
	(d1 (denom r1))
	(n2 (numer r2))
	(d2 (denom r2)))
    (make-rat (* n1 n2)
	      (* d1 d2)))))

(define (rat-eq r1 r2)
  (let ((n1 (numer r1))
	(d1 (denom r1))
	(n2 (numer r2))
	(d2 (denom r2)))
    (= (* n1 d2) (* n2 d1)))))

(define (rat-leq r1 r2)
  (let ((n1 (numer r1))
	(d1 (denom r1))
	(n2 (numer r2))
	(d2 (denom r2)))
    (if (>= (* d1 d2) 0);;if d1 and d2 have the same sign
	(<= (* n1 d2) (* n2 d1))
	(>= (* n1 d2) (* n2 d1))))))

Note how rat-add and rat-mul tear down their arguments using the ACCESSORS, then build up their result using the CONSTRUCTOR. Also note how rat-eq and rat-leq tear down their arguments using the ACCESSORS, then apply the appropriate tests on the constituent parts. These implementation details are hidden in the definitions, and users do not have to know how they work.


Now

(rat-eq (make-rat 10 8) (make-rat 5 4)) => #t

and this is correct, since 10/8 = 5/4. But

(numer (make-rat 10 8)) => 10
(denom (make-rat 10 8)) => 8

Suppose we don't like this and want to represent rationals in lowest terms. Doing this would allow us to save time in the equality test; we could just compare numerators and denominators.

We could have rat-eq reduce to lowest terms. But that would be just as inefficient. We could have rat-add, rat-mul, etc. do it after every operation.
* That's a lot of work
* What if we forget somewhere?

If we want to reduce to lowest terms, the right place to do it is in make-rat, since that's the only place <rat>s get created. We only do it once for each <rat> we create.

(define (make-rat n d)
  (if (and (number? n) (number? d) (not (= d 0)))
      (let ((g (gcd n d)))
	(cons (/ n g) (/ d g)))
      (error "...")))

 

Note that this still satisfies the spec:

(numer (make-rat x y)) x/g       x
----------------------- = ----- = ---
(denom (make-rat x y)) y/g       y

Now

(numer (make-rat 10 8)) => 5
(denom (make-rat 10 8)) => 4

With the new def of make-rat, we can always depend on <rat>s being in lowest terms. You can even make it part of the specification if you like. Then equality testing becomes simpler:

(define (rat-eq r1 r2)
  (and (= (numer r1) (numer r2))
       (= (denom r1) (denom r2))))

 

Or, we can just leave rat-eq as is for now and change it later if we like, since the spec is still satisfied. The point is: ABSTRACTION ALLOWED US TO MAKE THIS CHANGE EASILY.


We could simplify rat-leq if we knew the denominator would always be positive. Let's make it so. Again, the best place to make the change is in make-rat:

(define (make-rat n d)
  (if (and (number? n) (number? d) (not (= d 0)))
      (let* ((n2 (if (> d 0) n (- n)))
	     (d2 (if (> d 0) d (- d)))
	     (g (gcd n2 d2)))
	(cons (/ n2 g) (d2 g)))
      (error "...")))

 

Now since denominators are always guaranteed to be positive,
we can simplify rat-leq:

(define (rat-leq r1 r2)
  (let ((n1 (numer r1))
	(d1 (denom r1))
	(n2 (numer r2))
	(d2 (denom r2)))
    (<= (* n1 d2) (* n2 d1))))

 

We don't need to change anything else, because the spec is still satisfied.


Suppose that we had explicitly used `cons' instead of `make-rat'.
* We would also have used `pair' for everything else Scheme uses it for
- Which is a lot
* Then we'd have to go look at every single use of pair in the program,
- See if it looks like a make-rat
- Add a gcd computation
* We'd surely miss some or get some that aren't make-rats, and it would be a COSMIC HORROR.

But ABSTRACTION made it easy to make these changes. The trick:

* Build your program with layers of abstraction.
* HIDE the implementation.
* Then your life will be a LOT easier later when you need to change it.
* You just change it in one place.

Think of rat-add and rat-mul as if they were Scheme primitives
-- they don't look any different, anyways
-- and use 'em freely.


Today's words and concepts: