Recitation: Testing

## Dictionaries revisited This recitation relies on the attached [code](broken_code.zip), which implements a dictionary interface using a binary search tree, a list, and a sorted list. This code has some bugs. In this recitation, we will use testing and debugging tools to find the bugs. You aren't required to fix them, but if you want extra debugging practice, fixing them would be good exercise! **Exercise**: unpack the code into your working directory for the recitation, and create your usual recitation working file (the one that you'll `#use` as you go). Read the `.mli` files to get a high-level overview of what's in each file. **Exercise**: use `cs3110 compile` to compile the three dictionary implementations. Add the appropriate `#directory` and `#load` commands to your working file for the recitation. To make sure that you've set everything up, add the following lines to your working file and `#use` it: ``` let x = BstDict.(insert empty 0 'x') let y = ListDict.(insert empty 1 'y') let z = SortedListDict.(insert empty 2 'z') ``` If this does not work, you may wish to revisit [lab 3](../03-var/rec.html) or ask for help. ### Unit testing Unit tests are tests of the smallest parts of the codebase. They will typically test a single function on a variety of inputs to ensure that the function gives the expected output. When developing unit tests, it is important to test examples that exercise different parts of the code, and also to try to design tests that stress unusual corner cases. The **coverage** of a set of tests is the set of lines of code that are executed while running the tests; ideally, your tests would have 100% coverage. In fact, for many languages, there are tools called "coverage checkers" that will run your tests and tell you how much of your code was actually tested. We will not ask you to use coverage checkers, but if you are curious, you can look into [the bisect tool](http://bisect.x9c.fr/). **Exercise**: On a piece of paper, draw a collection of binary search trees and elements to insert and remove that would give you good test coverage of `BstDict.insert` and `BstDict.remove`. For each, draw the expected output tree. **Exercise**: In your working file for the recitation, write down expressions that test the inputs and outputs you drew in the previous exercise. For example: ``` insert (Node (Leaf, 0, 'x', Leaf)) 1 'y' = Node (Leaf, 0, 'x', Node (Leaf, 1, 'y', Leaf));; ``` If you `#use` the file, you will find that it doesn't compile, because the `Node` and `Leaf` constructors are not public; they are not exposed by `bstDict.mli`. This leads to a problem: you can't put your unit tests anywhere but `bstDict.ml` because the necessary details are hidden. But if you put them in `bstDict.ml`, then they will be run anytime anyone uses your `bstDict` module in their program. To resolve this dilemma, the `cs3110` tool provides a small syntax extension to OCaml to let you write and run unit tests. You can write a test as follows: ``` TEST = ... ``` where "..." is an expression that evaluates to either true or false. You can also give your test a name, for example: ``` TEST "my_test" = insert (Node (Leaf, 0, 'x', Leaf)) 1 'y' = Node (Leaf, 0, 'x', Node (Leaf, 1, 'y', Leaf)) ``` The code in the `TEST`s will not normally be executed; to actually run the tests, you can run `cs3110 test <program>`. This will cause all of the `TEST`s defined in your program to be run; any that evaluate to false will be printed out on the console. **Warning**: before running `cs3110 test <program>` you must first `cs3110 compile <program>`. If you change the code or the tests, you will need to recompile before running `cs3110 test` again! **Exercise**: to see how failing tests look, add ``` TEST "designed_to_fail" = 1 < 0 ``` to `bstDict.ml`, and run `cs3110 test` on it. **Exercise**: move your bst unit tests to the `bstDict.ml` file, using the `TEST` keyword. Use `cs3110 test` to run them and see if any fail. ### Testing for exceptions Sometimes you just want to test that a function doesn't raise an exception. You can use `TEST_UNIT` to do that. `TEST_UNIT` is like `TEST` except that the expression should evaluate to `unit` (instead of `true` or `false`); the test fails if it raises an exception. **Exercise**: `SortedListDict.first` uses `List.hd` and so it might throw an exception. Write some tests using `TEST_UNIT` to check that it doesn't. ### Tests that share set-up code Sometimes you want to write multiple tests that use some shared data structure or code, or require some complicated procedure to set the tests up before you run them. The `TEST_MODULE` keyword allows you to write a large block of code that gets executed during testing. In particular, it lets you define a module that is only created during testing, and it runs any `TEST`s inside that module. The syntax is as follows: ``` TEST_MODULE = ... ``` where ... is a module. As with `TEST` and `TEST_UNIT`, you can add a name if you like. For example: ``` TEST_MODULE "tree_tests" = struct let example = Node (Leaf, 0, 'x', Leaf) TEST "insert_in_tree" = insert example 1 'y' = Node (Leaf, 0, 'x', Node(Leaf, 1, 'y', Leaf)) TEST "remove_1_from_tree" = remove example 1 = example TEST "remove_0_from_tree" = remove example 0 = empty end ``` In general, creating example might be complicated; but the code in the module defined by the `struct ... end` will only be executed during testing. **Exercise**: group some of the tests you defined in `bstDict.ml` into a module and use `TEST_MODULE` to run them. ### Regression testing When developing your code, you will often come across bugs. A very good habit to develop is to create a unit test as soon as you discover a bug. This helps for three reasons: - Narrowing down the bug to a "minimal test case" can help you figure out where the bug actually is. - By developing the test case first, you can easily check a fix to see whether it worked. - It turns out that bugs often reappear in the future as people continue to hack on the code. Documenting the bug in your test suite will notify someone immediately if they accidentally reintroduce the bug, which will likely make it easier for them to re-repair it. Tests that are based on pre-existing bugs are called **regression tests**. **Exercise** (\*): You probably remember at least one bug you had in A1. Write a regression test to detect that bug. **Exercise** (\*): As you implement future assignments, be sure to create regression tests! ### Test-driven development There is a popular approach to programming called "test-driven development" (TDD). The idea behind TDD is that you start by writing your interface (your `.mli` file), but leave your implementation completely stubbed out (for example by defining every function as just `failwith "not implemented"`. Then you write your tests (which won't pass, of course). Only *after* you have a test suite that you're happy with do you start writing the actual code. Whether to write your tests before, during, or after development of your code is a matter of taste to some extent. I encourage you to experiment with TDD during the course of the semester so that you can find the style that you are comfortable with. However, regardless of *when* you write your tests, you should spend time writing and thinking about tests. Don't put testing off until the last minute. In many software development teams, code without adequate tests will be rejected immediately. ### Interface testing As you consider writing tests for the other `Dictionary` implementations (`ListDict` and `SortedListDict`), you will find that there are many tests that work just as well for all three implementations. For example, the `Dictionary` specification says that for any `x`, `remove empty x = empty`; this test could be written for any of the three dictionary modules, or for any other dictionary module. **Exercise:** By examining the comments in `dictionary.mli`, describe at least one general test for each function in the `Dictionary.S` module signature. Tests that rely only on an interface are called **interface tests** or **black-box tests** (because they don't depend on the implementation details of the module being tested). Tests that do depend on the implementation are sometimes called **glass-box tests**. A good way to implement interface tests is by putting them in a functor in the same file in which a module type is defined. The functor takes in an implementation module, and contains a collection of interface tests. For example, you might include the following in `dictionary.ml`: ``` module Tests (D : Dictionary) = struct open D TEST = remove 0 empty = empty TEST = remove 1 (insert 1 empty) = empty end ``` If you define your tests this way, then new implementations of the interface get all of those tests "for free". After implementing a module `D`, you can simply write ``` TEST_MODULE "specification tests for D" = Dictionary.Tests(D) ``` to run all of the specification tests. **Exercise**: write a `Dictionary.Tests` functor as described above. Include all of the general tests you listed in the previous exercise. Create a file called `tests.ml` containing the following lines: ``` TEST_MODULE "spec tests for BstDict" = Dictionary.Tests(BstDict) TEST_MODULE "spec tests for ListDict" = Dictionary.Tests(ListDict) TEST_MODULE "spec tests for SortedDict" = Dictionary.Tests(SortedDict) ``` Run the tests, and see if you discover any new bugs!