Lecture 26: Refactoring

Administrivia

Today is RDZ’s last lecture, Thursday is a guest lecture by Professor Myers.

PS5 is officially due Friday 11:59PM. However, if we have it before we start grading, there will be no late penalty assessed. We will grade it no earlier than Tuesday (a week from today). More details will be available shortly.

There will be a final exam review session, details on Thursday.

Final topic: software engineering (building large programs)

Building a large program is not at all like building a small program. And as an undergrad, there is no way to build a large program (at least, not in class!) For instance, building the spec itself takes a lot of work and negotiation. Testing is at least as important as development (Microsoft has more testers than developers, probably should have even more!)

Refactoring

Basic idea: change the internal structure of a module, while leaving the external interface the same. The goal is to avoid changing both how it works (internally) and what features it supports (externally) at the same time, which has a high probability of resulting in a system which works neither internally nor externally!

To discuss this issue in more depth, we need a short discussion of the testing process. In general, as you build a module you want to design tests. If you can't write test cases, the specification is undoubtedly not complete, and you have no business writing code.

Regression tests

You can write test cases for the entire module, and also for various parts of the module. A test case basically says "this input should result in this output". Once you have a correct implementation, even if slow, you can rapidly create a large number of test cases. This is (one more) argument for rapid prototyping. Also, the test cases should grow monotonically, to make sure you didn't re-introduce old bugs. Finally, as a rule, be sure to test extreme cases. This is a standard evil-professor/interviewer trick; your function should work with n=0 and n=1, or it isn't correct.

Test cases should generate deterministic output (no user interaction), and should result in easily comparable outputs, such as integers. (Note that today many programs have a GUI, and trying to compare the old output of a GUI with the new output is a major research question, one that RDZ has actually worked on for Microsoft). The old implementation can be used as a benchmark. This is yet another justification, BTW, for rapid prototyping, since the old implementation doesn’t need to be fast.

Since we are trying to avoid making the situation worse (than the old implementation), this is sometimes called "regression testing".

Refactoring PS3

As an example, we will consider the environment implementations with lists and red-black trees that we saw in PS3. The contract and code for environments is very simple (this is a slight modification of the PS3 code):

structure Environment :> sig

  datatype env

  makeEnv: string * value list -> env

  val lookupBinding : env -> string -> value option

  val insertBinding : env -> string -> value -> env

end

= struct

datatype env = Env of (string * value) list

fun makeEnv(en: string * value list):env = Env(en)

fun lookupBinding (Env(e)) (id:string): value option =

  case List.find (fn (s, _) => id = s) e of

    NONE       => NONE

  | SOME(_, v) => SOME v

fun insertBinding (Env(e)) (id:string) (v: value): env =

  Env((id, v) :: e)

end

(* evaluator code excerpts *)

fun evaluate (exp : exp, env : env) =

  case exp of

    Int_c i => Int_v i

    | Id_e i => (case (lookupBinding env i) of

                   SOME v => v

                 | NONE =>

              Error.runtime ("binding " ^ i ^ " not found"))

  and evaluateDeclare (decl: decl, env: env): env =

    case decl of

      Val_d(name, t, exp) => insertBinding env name (evaluate (exp,env))

fun mainloop(env:env)

…

eval(exp,env)

fun run(): unit = let

    val () = mainLoop (Environment.makeEnv(…))

It is trivial to design lots of test cases for the environment model.

Let's pretend that our initial implementation was with lists. Furthermore (and this is realistic) let's have lots of bindings in the original global environment, the one returned by makeEnv.

Now suppose that we want to REFACTOR our original implementation, so that is uses RB trees in a somewhat clever way. What we can do is to implement the global environment as an RB tree, and to implement all other environments as lists. This is actually a reasonable tradeoff, since we expect the other environments to be pretty small and shallow.

In order to do this, we will keep the exact same interface and the same regression tests! Insertbinding will be the same; only the implementation of lookup, makeEnv and the env type will have to change. This is a very nice example of the power of abstraction.

datatype env = Env of (string * value) list * rbTree

fun makeEnv(en: string * value list):env = Env([],makeRBTree(en))

fun lookupBinding (Env(lst,rbTree)) (id:string): value option =

  case List.find (fn (s, _) => id = s) lst of

    NONE       => rbLookup(id,rbTree)

  | SOME(_, v) => SOME v

fun insertBinding (Env(lst,rb)) (id:string) (v: value): env =

  Env((id, v) :: lst, rb)

Design patterns

Refactoring is a nice example of something that is sometimes called a “design pattern”. Design patterns capture common patterns of software structure, and common changes made to them. A classical example of a design pattern is to abstract out some piece of functionality that you’ve written multiple times. Amusingly enough, these things are often called “code smells”.

For example, the “OnceAndOnlyOnce” mantra justifies much of what I said at the beginning of the course. SML is a good language in terms of this. Another mantra, regarding refactoring, is ThreeStrikesAndYouRefactor

Patterns are actually fun to learn about, though any programmer with experience probably effectively knows about them already.