Today: * Compilation * Source to source transformation, preserving the semantics of the language * Reasoning about programs ----------------------------------------------------------------------------------- Compilers versus Evaluators The evaluator takes a program as input and runs it, returning its value Evaluator contract is Program ---> [ Evaluator ] ---> Value Compiler contract is Program ---> [ Compiler ] ---> Program' ---> [ Evaluator ] ---> Value where we preserve the semantics of the language: (eval P env) = (eval (compile P) env) Typically the output of the compiler is a different language (such as PPC assembler, which is interpreted by the PPC chip). In CS212, the output of the compiler will be a subset of Dylan. Our compiler thus performs source-to-source transformation. The input is a Dylan program (represented as a list); the output is a Dylan program. In many respects the compiler and evaluator are similar -- they are programs that "walk over" source code. An evaluator computes a value on each recursive call, while a compiler computes code which will eventually compute a value. Why bother? Program' is just like Program, only *faster*. To do this, the compiler *reasons* about Dylan programs (although the reasoning is quite simple). ------------------------------------------------------------------------------------ To see why this might be useful, consider defining (define (useless <method>) (method (x) (+ (* 3 5) x))) (map useless '(1 2 3 ... 100)) How does this work in the evaluator? ... extend global env by [x: 1] ... evaluate (+ (* 3 5) x) ... evaluate (* 3 5)... ... extend global env by [x: 2] ... evaluate (+ (* 3 5) x) ... evaluate (* 3 5)... . . . ... extend global env by [x: 100] ... evaluate (+ (* 3 5) x) ... evaluate (* 3 5)... That's a lot of evaluations of (* 3 5) Note: you might not write code like this, but a macro could. Or in-line functions could (suppose you call someone else's code). Usually there is a Program' that is a *lot* faster (typically 100-1000 times). Compilers involve getting from Program to Program' In this example, we want to get from (method (x) (+ (* 3 5) x)) to (method (x) (+ 15 x)) ------------------------------------------------------------------------------------ To make life easier, we will consider only compiling a subset of Dylan programs. * No side effects. As you'll see, side effects are hard to reason about. * Few primitives (arithmetic and boolean only) * Special forms limited to: IF, METHOD, BIND * BIND does one binding only Even this language subset includes very complicated expressions. Our strategy is to produce an *intermediate form* from an expression and then optimize that intermediate form. To see why this is necessary, consider the expression (f (g x) (g x)) We want to turn this into something like (bind ((temp (g x))) (f temp temp)) To evaluate this expression, we evaluate (g x), then we evaluate (g x), then we invoke f on the first result and the second result. But in Dylan, these intermediate results are implicit. We need to make them explicit, through a process we call LINEARIZATION. It's a little hair raising in places (we'll provide the code for those who want to look at it). You should know what a linearized expression is, but not necessarily how to write code to linearize one. Linearization will produce an intermediate form like: (bind ((val1 (g x))) (bind ((val2 (g x))) (bind ((val3 (f val1 val2))) t3))) We will then optimize this intermediate form to produce (bind ((val1 (g x))) (bind ((val2 val1)) (bind ((val3 (f val1 val2))) t3))) which optimizes out one call to g [NOTE: side effects are harmful!] ------------------------------------------------------------------------------------ In the linearized form two things are made explicit: * Intermediate results, and * The order of evaluation There are thus 2 parts to the compilation process: * Linearization, which makes various things explicit, but doesn't result in a faster program, and * Optimization, which makes a program faster Program ---> [Linearizer] ---> Linearized Program ---> [Optimizer] ---> Program' As before, *everything* will be a Dylan subset ------------------------------------------------------------------------------------ The output of the linearizer, which will also be the output of the optimizer (and hence of the compiler) will be a *very* restricted subset of Dylan, called Linear Dylan (Linear-D for short). Dylan subset ---> [Linearizer] ---> Linear-D ---> [Optimizer] ---> Linear-D The key property of Linear-D is that all combinations are SIMPLE. A combination is SIMPLE if the operator and the operands are all atomic (i.e., symbols or numbers). For example, (f a 23) is simple, while ((f) (g)) is not. In addition, in conditionals the test is required to be atomic. So (if x 1 2) is simple, while (if (not x) 1 2) is not. In fact, the latter expression would be linearized to (bind ((val1 (not x))) (bind ((val2 (if val1 1 2))) val2)) An Linear-D expression is essentially a giant series of BINDs which eventually returns a value in thge body. Every bind invovles a single simple computation (no nesting). An important part of linearization is called ALPHA-renaming. Basically, whenever we see a METHOD we need to give its parameters unique names, or we will get confused. For example, consider ((method (f) (f x)) (method (f) f)) (which applies the identity function to x) will be alpha-renamed to ((method (f1) (f1 x)) (method (f2) f2)) After alpha-renaming, we can be sure that any two variables with the same name *are* the same variable. ------------------------------------------------------------------------------------ OK, we now have linearized code. How do we optimize it? The optimizations we will consider are all fairly simple, although they can improve your code a lot. To understand optimization, we need to go back to our first example and think about the relationship between compilation and evaluation (define (useless <method>) (method (x) (+ (* 3 5) x))) We can try and turn this into a better piece of code, but we have to bear in mind that we have *no* idea what the value of x is. In fact, we won't know until we actually apply this procedure to something (i.e., at run time). On the other hand, we know what the value of (* 3 5) is, irrespective of the value of x (i.e., at compile time). Lesson: * The values of method parameters are only known at run time * Everything else is known at compile time Obvious consequence: if you want to optimize a program that doesn't contain any procedures, you can simply compute the value. If the compiler is given some complex arithmetic expression, it should simply return the value. ------------------------------------------------------------------------------------ Part of what a compiler does can be described as PARTIAL EVALUATION. We take code like: (method (x) (+ (* 3 5) x)) and return code somewhat like (method (x) (+ 15 x)) In essence, anything can be computed at compile time should be computed. The simplest such optimization is called CONSTANT FOLDING, which replaces operations by constants where possible. Given an Linear-D expression like (bind ((val1 (* 3 5))) (bind ((val2 (+ val1 x))) val2)) constant folding will produce a Linear-D expression like (bind ((val1 15)) (bind ((val2 (+ val1 x))) val2)) Basic idea: * Recursive tree-walk of a Linear-D expression * If a variable's value is known at compile time, SUBSTITUTE! * Compile-time known values are numbers (could be primitives) * When you encounter a new binding, if the value of the binding is known substitute in the body and continue * If you're looking at an expression (F G H), and F, G and H are known at compile time, and F is a primitive, evaluate it! [map known? exps] (bind ((val1 (* 3 5))) BODY) ---> replace with a new BODY with val1 replaced by 15 wherever it occurs (and no bind) Note: what if F is known to be a method at compile time and G, H are constants?? We need something like an evaluator in our compiler! A real partial evaluator requires a lot of work... Sample: (bind ((val1 14)) (bind ((val2 (method (val3) (* val1 3)))) (bind ((val3 (val2 run-time-variable))) val3))) ==> 42 ------------------------------------------------------------------------------------ There are other related optimizations which aren't quite partial evaluation, but which are similar in flavor. Example: algebraic simplification. The simplest examples can be handled by pattern matching -- look for (bind ((x (* y 0))) ...) and the like. We've done something like this already, just not as part of a compiler. More complex examples involve quite non-trivial computation; do two arbitrary expressions compute the same value? For arithmetic expressions there is actually a pretty simple algorithm, based on the fact that zeros of polynomials are sparse. ------------------------------------------------------------------------------------ An interesting example is COMMON SUBEXPRESSION ELIMINATION. Compute something once - why compute it again? Consider an expression like (f (+ a b) (+ a b)) Linearizer produces (bind ((val1 (+ a b))) (bind ((val2 (+ a b))) (bind ((val3 (f val1 val2))) val3))) We'd like to avoid computing (+ a b) twice. This is actually pretty similar to constant folding -- if we know something at compile time we don't need to recompute it. However, while in constant folding we replace variables with values, in common subexpression elimination we replace expressions with variables. This needs to be converted to (bind ((val1 (+ a b))) (bind ((val2 (f val1 val1))) val2)) The key is to see that val1 and val2 are the same expression, and replace them by one computation. ------------------------------------------------------------------------------------ Another example: dead code elimination When an IF's test value is known at compile time, we can eliminate the consequent or alternate. Implementing this optimization has been a problem set or final exam question in previous years. A similar procedure can get rid of useless bindings like (bind ((var1 var2)) ...) Implementing this optimization has been a problem set or final exam question in previous years. ------------------------------------------------------------------------------------ Procedure inlining There is overhead involved in a procedure call. If we have (define foo (method () (* a c))) then (+ (foo) (foo)) is slower than (+ (* a c) (* a c)) [not much, but it can matter inside a loop!] One solution is to write your code as macros. Disadvantages? * Kind of painful (macros are hard to debug) * Space versus time Alternative: inlining Note that in Linear-D we leave calls to methods alone. Thus (+ (foo) (foo)) is linearized into (bind ((val1 (foo))) (bind ((val2 (foo))) (bind ((val3 (+ val1 val2))) val3))) which after eliminating common subexpressions becomes (bind ((val1 (foo))) (bind ((val2 val1)) (bind ((val3 (+ val1 val2))) val3))) but we might be better off inlining the call to foo... This depends, of course, on what foo does. Can you tell what a function does without running it? NO! See last lecture of CS212. Summary: reasoning about programs has some inherent limits. For some simple functions, we can "inline" (or "open-code") them. When is this a fatal error? Recursion! Still, this can be useful. ANSI C supports an "inline" declaration. Use this at your own risk (the time-space tradeoffs are not always obvious!) Limited inlining of recursive code can be very useful (it's called loop unrolling). ------------------------------------------------------------------------------------ Final note: many of these optimizations enable each other. Doing inlining can enable common subexpression, for example. This combination can be pretty similar to simply *memoizing* (which we will talk about in streams, soon). (+ (foo) (bar)) ==> (+ (* a c) (* c a)) ==> (bind ((val1 (* a c))) (bind ((val2 (+ val1 val1))) val2)) A typical compiler makes several passes over the code, doing a bunch of different optimizations. Some passes need to be done more than once. How this is done is beyond the scope of this course. Here is a table of some optimizations: ---------------------------------------------------------------------------------- | Constant Folding | Replace Variables with Values | | | Common Subexpression | Replace Expressions with Variables | Elimination | | | | Inlining | Replace Procedure Calls with Expressions -----------------------------------------------------------------------------------