CS410, Summer 1998
Dan Grossman
Lecture 9 Outline

Goals:
* B-trees
* Variations on Homework 2

* There's a generalization of 2-3 trees called B-Trees.  It turns out
  everything still works just fine by making 2 => k and 3 => 2k (or 2k-1
  if you prefer).  Now the height is between log_2k n and log_k n. 

  Other minor changes:
  * Use an array of child pointers (for convenience or maybe for binary search).
  * Move minimum values to the parent (to avoid following child pointers -- 
    could have done this for 2-3 trees too.  In B-tree applications it's more
    important.  We'll see why in a minute.)

  Now search is log_2 k * log_k n , insert and delete basically k log_k n.

  What is k? It's chosen so one node fits exactly on one page of disk.
  (This is also why we moved the minimum values to the parents.)
  Computation is so much cheaper than bringing in pages from disk,
  that we should minimize pages.  (So things like binary search on the
  array are completely irrelevant.)

  This is great for B-trees -- we have the space, we just don't want to touch
  it.  Red-black doesn't generalize to k and d touch more nearby nodes
  when rebalancing.

  As a result, B-trees are very common in databases.  Well, actually B+
  trees, which are like B-Trees but some data is stored at internal nodes.
  This makes insertion and deletion a bit more complicated and we will
  not discuss it further.

  (Warning: what CLR calls B trees are really B+ trees.)

* Consider the postfix evaluator you wrote for your homework.  It
  probably looked something like this (some error-checking left out).

postfix:
  while (not end of file) {
  if plus
     s.push(s.pop() + s.pop())
  else if minus
     s.push(s.pop() - s.pop())
  else if mult
     s.push(s.pop() * s.pop())
  else if divide
     denom = s.pop()
     if (denom == 0)
         throw new CalculatorDivideByZeroException();
     s.push(s.pop() / denom)
  else if negation
     s.push(0 - s.pop());
  else 
     try {
	s.push(Inter.parseInt(token))
     } catch (NumberFormatException e) {
	throw new CalculatorIllegalInput Exception(); // maybe token
     }
  read next token
  }
  return s.pop();

Comments:
* Why it's better style to try integer last -- only throw exception in
  exceptional case.
* All of our binary operations look the same -- what if they really took
  30 lines of code.  How could we abstract them?  Although it may seem like
  overkill here, you should recognize the pattern -- all that changes is the 
  operation.
 
  Here's what I want:
	doBinaryOp(op) {
	     s.push(s.pop() op s.pop())
        }
	if plus
	    doBinaryOp(+);
        if minus
	    doBinaryOp(function (int a, int b) {
			return b - a;
                       });
        if mult
	    doBinaryOp(*);
        if divide
	    doBinaryOp(function (int a, int b) {
			 if (a == 0)
			     throw CalculatorDivideByZeroException
			 return b/a;
                       });
	...
  Do the same for unary op (you might have more unary operations later).
  
  Unfortunately, Java won't let me do this, although many languages will.
  But we can fake it in Java with an object-oriented approach (the
  throws declarations are explained below):

  abstract class BinaryOp {
     void doStackOp(Stack s) throws CDivideByZeroException{
	s.push(doOperation(s.pop(), s.pop()));
     }
     abstract int doOperation(int first, int second) 
	throws CDivideByZeroException;
   }

   class AddOp extends BinaryOp {
	int doOperation(int first, int second) throws CDivideByZeroException{
		return first + second;
	}
   }
   class SubOp extends BinaryOp {
	int doOperation(int first, int second) throws CDivideByZeroException{
		return first - second;
	}
   }
   class MultOp extends BinaryOp {
	int doOperation(int first, int second) throws CDivideByZeroException{
		return first * second;
	}
   }

   class DivOp extends BinaryOp {
	int doOperation(int first, int second) throws CDivideByZeroException {
	   if (first == 0)
		throw new CalculatorDivideByZeroException();
	   return second / first;
	}
  }

  
  "throws CDivideByZeroException" must be added to the abstract
  doOperation because one of its realizations requires it.
  But that means the same method in AddOp, SubOp, and MultOp
  must also declare to throw it, even though they don't.  That's just
  the rules of Java. 

  We should do the same thing with unary operations.  After all, all
  unary operations have the same form, and some others might exist later:

  abstract class UnaryOp {
	void doStackOp(Stack s) throws CDivideByZeroException{
	    s.push(doOperation(s.pop());
	}
	abstract int doOperation(int arg);
  }

  class NegateOp extends UnaryOp {
	int doOperation (int arg) {
		return 0 - arg;
	}
  }

  The throws statement here isn't necessary, but it will be now that I
  realize that UnaryOp objects and BinaryOp objects have in common that
  they have operations which manipulate a Stack.  So let's have them extend
  a more abstract class:

  abstract class Op {
	abstract void doStackOp(Stack s) throws CDivideByZeroException;
  }

  The definitions of BinaryOp and UnaryOp should be changed so they extend Op.
  Now we have an inheritance hierarchy of:
		  Op
	   /              \
      UnaryOp             BinaryOp
        |           /    |       |     \
      NegateOp      AddOp SubOp MultOp DivOp  
  
  This hierarchy nicely captures our thoughts -- an Op is something that somehow
  manipulates the Stack.  A UnaryOp is something that computes based on the
  top of the Stack and pushes the result.  A BinaryOp is something that 
  computes based on the top two items of the Stack and pushes the result.
  The concrete classes do the computations we expect.  Now let's see how
  this cleans up our evaluator:
 
postfix:
    while (not end of file) {
	Op op = null;
	if plus
	   op = new AddOp();
	else if minus
	   op = new MinOp();
	else if divide
	   op = new DivOp();
	else if negation
	   op = new NegateOp();
	else if (op == null)
	   try{ s.push(Integer.parseInt ...)}
	   catch {...}
	else
	   op.doStackOperation(s);
    }

Notice how every Op has a doStackOperation which does just what we want.

Efficiency-wise, it's wasteful to keep make new Op objects when we
really only need on of each kind.  So here's the next version:

postfix:
Op addop = new AddOp();
Op subop = new AddOp();
Op multop = new MultOp();
Op divop = new DivOp();
Op negop = new NegateOp();

while (not end of file) {
    Op op = null;
    if (plus)      op = addop;
    else if (sub)  op = subop;
    else if (mult) op = multop;
    else if (div)  op = divop;
    else if (neg)  op = negop;
    else if (op == null)
        try{ s.push(Integer.parseInt ...)}
	catch {...}
    else 
       op.doStackOperation(s);
   
    read next token
}
return s.pop();

Now we see that the cascading if else if ... is really just taking the
String (we've been sloppy and written plus when we really had
token.equals("+")) and converting to an Op (or a number).  This is
taking time proportional to the number of operators, which might get
large.  But we know how to use one piece of data (the String) and use
it as a key to get another piece of data (the Op) -- we can use a
Dictionary, which we might assume we already have lying around.
Let's also assume the lookup method returns null if the key is not present.

postfix:
Dictionary dict = initPostfixDict();
while (not end of file) {
  Op op = dict.lookup(token);
  if (op !null) 
    try{ s.push(Integer.parseInt ...)}
	catch {...}
    else 
       op.doStackOperation(s);
   
    read next token
}
return s.pop();

All we have left out is the initializatoin of the Dictionary.  We can
put some arrays at the top of the file to define the String to Op
relationship.  Then when we define new operations, the only change to
this file will be adding to these arrays:

String [] opStrings = {"+", "-", "*", "/", "~"};
Op [] ops = {new AddOp(), 
	     new SubOp(), 
             new MultOp(), 
             new DivOp(), 
             new NegateOp()};
Dictionary initPostfixDict() {
    Dictionary dict = new Dictionary();
    for (int i = 0; i < ops.length; i++) {
	dict.insert(opStrings[i], ops[i]);
    }
    return dict;
}

This is all very slick.  The drawback is creating lots of files and
verbosity.  This is an artifact of Java and not of object-oriented
programming.  What we've done is made it very easy to add new
operations without changing our postfixEvaluator.  We could do the
same for other operations of course.  This is great when someone who
didn't write the original code is asked to add a new Op or two or
twelve.  The drawback is that our simple eval function now has its
computation split across many files.

I like to think of it as a 2D table with the types of objects on one axis and
the methods on the other:
		Eval	Print	BuyPizza    ...
Add
Sub
Mult
Div
Negate
...

Object-oriented programming encourages putting entire rows together
logically, whereas functional programming encourages putting entire
columns together logically.  While the styles can appear quite
different, you should be fluent in programs using either style and
realize that underneath it all the fundamental computation is the
same.