CS410, Summer 1998 Dan Grossman Lecture 9 Outline Goals: * B-trees * Variations on Homework 2 * There's a generalization of 2-3 trees called B-Trees. It turns out everything still works just fine by making 2 => k and 3 => 2k (or 2k-1 if you prefer). Now the height is between log_2k n and log_k n. Other minor changes: * Use an array of child pointers (for convenience or maybe for binary search). * Move minimum values to the parent (to avoid following child pointers -- could have done this for 2-3 trees too. In B-tree applications it's more important. We'll see why in a minute.) Now search is log_2 k * log_k n , insert and delete basically k log_k n. What is k? It's chosen so one node fits exactly on one page of disk. (This is also why we moved the minimum values to the parents.) Computation is so much cheaper than bringing in pages from disk, that we should minimize pages. (So things like binary search on the array are completely irrelevant.) This is great for B-trees -- we have the space, we just don't want to touch it. Red-black doesn't generalize to k and d touch more nearby nodes when rebalancing. As a result, B-trees are very common in databases. Well, actually B+ trees, which are like B-Trees but some data is stored at internal nodes. This makes insertion and deletion a bit more complicated and we will not discuss it further. (Warning: what CLR calls B trees are really B+ trees.) * Consider the postfix evaluator you wrote for your homework. It probably looked something like this (some error-checking left out). postfix: while (not end of file) { if plus s.push(s.pop() + s.pop()) else if minus s.push(s.pop() - s.pop()) else if mult s.push(s.pop() * s.pop()) else if divide denom = s.pop() if (denom == 0) throw new CalculatorDivideByZeroException(); s.push(s.pop() / denom) else if negation s.push(0 - s.pop()); else try { s.push(Inter.parseInt(token)) } catch (NumberFormatException e) { throw new CalculatorIllegalInput Exception(); // maybe token } read next token } return s.pop(); Comments: * Why it's better style to try integer last -- only throw exception in exceptional case. * All of our binary operations look the same -- what if they really took 30 lines of code. How could we abstract them? Although it may seem like overkill here, you should recognize the pattern -- all that changes is the operation. Here's what I want: doBinaryOp(op) { s.push(s.pop() op s.pop()) } if plus doBinaryOp(+); if minus doBinaryOp(function (int a, int b) { return b - a; }); if mult doBinaryOp(*); if divide doBinaryOp(function (int a, int b) { if (a == 0) throw CalculatorDivideByZeroException return b/a; }); ... Do the same for unary op (you might have more unary operations later). Unfortunately, Java won't let me do this, although many languages will. But we can fake it in Java with an object-oriented approach (the throws declarations are explained below): abstract class BinaryOp { void doStackOp(Stack s) throws CDivideByZeroException{ s.push(doOperation(s.pop(), s.pop())); } abstract int doOperation(int first, int second) throws CDivideByZeroException; } class AddOp extends BinaryOp { int doOperation(int first, int second) throws CDivideByZeroException{ return first + second; } } class SubOp extends BinaryOp { int doOperation(int first, int second) throws CDivideByZeroException{ return first - second; } } class MultOp extends BinaryOp { int doOperation(int first, int second) throws CDivideByZeroException{ return first * second; } } class DivOp extends BinaryOp { int doOperation(int first, int second) throws CDivideByZeroException { if (first == 0) throw new CalculatorDivideByZeroException(); return second / first; } } "throws CDivideByZeroException" must be added to the abstract doOperation because one of its realizations requires it. But that means the same method in AddOp, SubOp, and MultOp must also declare to throw it, even though they don't. That's just the rules of Java. We should do the same thing with unary operations. After all, all unary operations have the same form, and some others might exist later: abstract class UnaryOp { void doStackOp(Stack s) throws CDivideByZeroException{ s.push(doOperation(s.pop()); } abstract int doOperation(int arg); } class NegateOp extends UnaryOp { int doOperation (int arg) { return 0 - arg; } } The throws statement here isn't necessary, but it will be now that I realize that UnaryOp objects and BinaryOp objects have in common that they have operations which manipulate a Stack. So let's have them extend a more abstract class: abstract class Op { abstract void doStackOp(Stack s) throws CDivideByZeroException; } The definitions of BinaryOp and UnaryOp should be changed so they extend Op. Now we have an inheritance hierarchy of: Op / \ UnaryOp BinaryOp | / | | \ NegateOp AddOp SubOp MultOp DivOp This hierarchy nicely captures our thoughts -- an Op is something that somehow manipulates the Stack. A UnaryOp is something that computes based on the top of the Stack and pushes the result. A BinaryOp is something that computes based on the top two items of the Stack and pushes the result. The concrete classes do the computations we expect. Now let's see how this cleans up our evaluator: postfix: while (not end of file) { Op op = null; if plus op = new AddOp(); else if minus op = new MinOp(); else if divide op = new DivOp(); else if negation op = new NegateOp(); else if (op == null) try{ s.push(Integer.parseInt ...)} catch {...} else op.doStackOperation(s); } Notice how every Op has a doStackOperation which does just what we want. Efficiency-wise, it's wasteful to keep make new Op objects when we really only need on of each kind. So here's the next version: postfix: Op addop = new AddOp(); Op subop = new AddOp(); Op multop = new MultOp(); Op divop = new DivOp(); Op negop = new NegateOp(); while (not end of file) { Op op = null; if (plus) op = addop; else if (sub) op = subop; else if (mult) op = multop; else if (div) op = divop; else if (neg) op = negop; else if (op == null) try{ s.push(Integer.parseInt ...)} catch {...} else op.doStackOperation(s); read next token } return s.pop(); Now we see that the cascading if else if ... is really just taking the String (we've been sloppy and written plus when we really had token.equals("+")) and converting to an Op (or a number). This is taking time proportional to the number of operators, which might get large. But we know how to use one piece of data (the String) and use it as a key to get another piece of data (the Op) -- we can use a Dictionary, which we might assume we already have lying around. Let's also assume the lookup method returns null if the key is not present. postfix: Dictionary dict = initPostfixDict(); while (not end of file) { Op op = dict.lookup(token); if (op !null) try{ s.push(Integer.parseInt ...)} catch {...} else op.doStackOperation(s); read next token } return s.pop(); All we have left out is the initializatoin of the Dictionary. We can put some arrays at the top of the file to define the String to Op relationship. Then when we define new operations, the only change to this file will be adding to these arrays: String [] opStrings = {"+", "-", "*", "/", "~"}; Op [] ops = {new AddOp(), new SubOp(), new MultOp(), new DivOp(), new NegateOp()}; Dictionary initPostfixDict() { Dictionary dict = new Dictionary(); for (int i = 0; i < ops.length; i++) { dict.insert(opStrings[i], ops[i]); } return dict; } This is all very slick. The drawback is creating lots of files and verbosity. This is an artifact of Java and not of object-oriented programming. What we've done is made it very easy to add new operations without changing our postfixEvaluator. We could do the same for other operations of course. This is great when someone who didn't write the original code is asked to add a new Op or two or twelve. The drawback is that our simple eval function now has its computation split across many files. I like to think of it as a 2D table with the types of objects on one axis and the methods on the other: Eval Print BuyPizza ... Add Sub Mult Div Negate ... Object-oriented programming encourages putting entire rows together logically, whereas functional programming encourages putting entire columns together logically. While the styles can appear quite different, you should be fluent in programs using either style and realize that underneath it all the fundamental computation is the same.