CS312: Some Thoughts on Debugging

Some Thoughts on Debugging

You wrote a program, you run it, and it does not work. Now what? This handout is intended to provide you with some ideas on how to handle debugging.

Before You Debug the Program, Debug Your Thinking

Do you understand the problem you are trying to solve? Read the problem specification carefully, and make sure it is exhaustive and is not contradictory. If you discover contradictions, eliminate them by revising the problem specification.

If the specification appears not to be exhaustive, identify the cases that have been missed. Say, the problem has been defined to work on lists (or non-negative numbers), but nothing is explicitly said about the empty list (or negative numbers). Ask yourself questions like the following:

Does the problem make sense for the cases that are not explicitly mentioned? If not, what should the program do if they occur anyway - raise an exception? Have these cases somehow implicitly been excluded by the specification?

If the problem makes sense for the cases that are not explicitly mentioned, will a careful re-reading of the specification reveal that there is a natural extension that covers the cases at hand? For example, if you have to print a list by enumerating its elements between brackets (e.g. "[ element1, element2, ...]"), a natural way to print an empty list could be "[ ]". If there is no obvious "natural" extension of the specification, revise the specification to exclude these cases, or specify explicitly how they should be handled.

Test Simple Cases First

Identify the simplest combination of inputs that make sense for your problem. If you work with lists, maybe the empty list is the easiest to handle. If you test a function that takes an integer parameter, maybe the value of the function is simplest to compute for input 0 or 1.

It is often useful to test the code on cases that are somehow "degenerate". Will your algorithm that sorts a list of numbers work if all the numbers are identical? Will your division algorithm work when you divide a number by itself, or by 0? Will your function crash if its string argument is the empty string?

Cover All Cases When Testing

Make sure that you examine the behavior of your solution for all relevant cases. What is "relevant" depends both on how the problem was specified, and on the solution you chose. For example, you might have decided to handle lists of even and odd lengths differently in your code, even though the specification does explicitly refer to the length of the lists.

Always ask yourself whether your tests have covered both the cases that the specification explicitly mentions, and the cases that your solution implicitly defines.

Go Bottom-Up in the Hierarchy

Often, your solution will consist of a set of functions that call one another in a hierarchical fashion (as opposed to mutually recursive functions). Then it is a good idea to test the functions at the bottom of the function hierarchy first, so that you can rely on their correctness when you debug top-level functions.

Subordinated functions might use data that has been partially processed by caller functions. If you test these functions in isolation, you will have to build and supply their input "by hand", which is sometimes non-trivial. The effort is well spent, however, as it will save a lot of time when debugging at the higher levels of the hierarchy.

In SML we often embed functions within other functions by using let statements. These helper functions are also good candidates for separate testing and debugging, which involves riping them out and defining them at the top level, or wrapping them temporarily inside a simple function whose only purpose is to provide test arguments.

Capture and Visualize the Computation's History

The principles delineated above provide general guidance on how to proceed with testing. They provide help in detecting errors in a systematic manner, but they do not always help in identifying the specific piece of code that is responsible for the error. Some programming environments provide sophisticated debuggers that one can use to examine the status of a computation at any stage; unfortunately this is not the case with our version of SML. We will have to rely on workarounds.

One way to capture the dynamic behavior of our algorithms is to use the print function.

Print takes a string argument and sends it to the standard output; it does not return any value (thus its type is unit). SML allows for more sophisticated ways of generating output - we refer the interested reader to the IO section of the SML Basis Library. Note that print is an example of a function interesting only for its side effects, as it computes no values.

Take a look at the example below:

fun map (f: int -> 'b) (lst: int list): 'b list = 
  case lst of 
    [] => (print("empty\n"); [])
  | hd::tl => (print(Int.toString(hd) ^ "\n"); [f(hd)] @ (map f tl))

This is a possible implementation for the Lists.map function described in the Basis Library, except that is particularized for integer lists. We use integer lists to simplify printing.

This version of map always prints the head of its second argument, or prints empty if the list is empty. As this is a very simple function, printing the head element is enough to monitor the algorithm's progress.

Here is an example run:

- map (fn x => x * x) [1, 2, 3, 4];
1
2
3
4
empty
val it = [1,4,9,16] : int list

The implementation above is a good start, but maybe too restrictive: it only works for integers, and it only displays limited information. In this simple case we might be able to work out the kinks in its implementation even in this restricted (i.e. particularized) setting, but this approach will not always work. What can we do?

Well, we can generalize by providing a print function as the third argument of map. Note how we use the option type to wrap elements of the underlying type 'a.

fun map (f: 'a -> 'b) (lst: 'a list) (prt: 'a option -> unit): 'b list = 
  case lst of 
    [] => (prt NONE; [])
  | hd::tl => (prt (SOME(hd)); [f(hd)] @ (map f tl prt))


fun prtInt (x: int option): unit =
  case x of
    NONE => print("empty\n")
  | SOME(x) => print(Int.toString(x) ^ "\n")


- map (fn x => x * x) [1, 2, 3, 4] prtInt;
1
2
3
4
empty
val it = [1,4,9,16] : int list


fun prtStr (x: string option): unit =
  case x of
    NONE => print("empty\n")
  | SOME(x) => print(x ^ ", ")


- map (fn x => "> " ^ x ^ " <") ["a", "b", "c", "d"] prtStr;
a, b, c, d, empty
val it = ["> a <","> b <","> c <","> d <"] : string list

Now we can test map for different types, while also experimenting with various output formats for each type.

We can take this idea further: by suitably adjusting the type of prt we can expose as much information as needed (and available) from inside map. Take a look at this:

fun map (f:'a -> 'b) (lst:'a list) (prt:'a option * 'b option -> unit):'b list = 
  case lst of 
    []     => (prt(NONE, NONE); [])
  | hd::tl => (prt(SOME(hd), SOME(f(hd))); [f(hd)] @ (map f tl prt))


fun prtInt (x: int option, y: int option): unit =
  case (x, y) of
    (SOME(i1), SOME(i2)) 
           => print("(" ^ Int.toString(i1) ^ ", " ^ Int.toString(i2) ^ ") " )
  | (_, _) => print("(empty, no value)\n")


- map (fn x => x * x) [1, 2, 3, 4] prtInt;
(1, 1) (2, 4) (3, 9) (4, 16) (empty, no value)
val it = [1,4,9,16] : int list

An alternative method to developing customized printing functions is to collect all the relevant values during the function execution, and then return the result together with the function's original result. Here is a way to do it:

fun map (f: 'a -> 'b) (lst: 'a list) (trace: ('a list * 'b option) list) 
                                  : ('b list) * (('a list * 'b option) list) = 
  case lst of 
    []     => ([], [([], NONE)])
  | hd::tl => let
                val (fs, trc) = map f tl trace
              in
                ([f(hd)] @ fs, [(lst, SOME(f(hd)))] @ trc)
              end


-map (fn x => x * x) [1, 2, 3, 4] []
val it =
  ([1,4,9,16],
   [([1,2,3,4],SOME 1),([2,3,4],SOME 4),([3,4],SOME 9),([4],SOME 16),
    ([],NONE)]) : int list * (int list * int option) list

The function now returns both the actual result and a trace of its execution. The trace is a list, each element of which contains the entire second argument, and also the value computed in the respective recursive call. No customized print function was needed.

The function had to be rewritten in order to allow for the separation of returned values from recursive calls, and to later combine these with the values generated in or captured from the current call. You should be careful not to introduce (further) errors when making these changes. This approach is overkill for a simple function - for complicated algorithms, however, it can be a life saver.