Optional Lab: Randomized Testing

# Randomized Testing This lab is an optional extension to the main lab on testing. Although you don't have to do this one, it is highly recommended for your own understanding of a industrial testing practice! To get started, download [the provided code for this lab](qchecks.ml). ## QCheck pseudorandom value generation One of the key pieces of functionality provided by QCheck is the ability to generate pseudorandom values of various types. Here is some of the signature of the module that does that: ``` module QCheck : sig ... module Gen : sig type 'a t = Random.State.t -> 'a val int : int t val generate : ?rand:Random.State.t -> n:int -> 'a t -> 'a list val generate1 : ?rand:Random.State.t -> 'a t -> 'a ... end ... end ``` An `'a QCheck.Gen.t` is a function that takes in a PRNG state and uses it to produce a pseudorandom value of type `'a`. So `QCheck.Gen.int` produces pseudorandom integers. The function `generate1` actually does the generation of one pseudorandom value. It takes an optional argument that is a PRNG state; if that argument is not supplied, it uses the default PRNG state. The function `generate` produces a list of `n` pseduorandom values. ##### Exercise: generate [✭, optional] Start a new session of utop. Enter the following: ``` # #require "qcheck";; # QCheck.Gen.(generate 3 int);; ``` Now exit utop, and repeat the above. What do you notice about the sequence of values that is produced? What do you infer about how QCheck seeds the PRNG if you don't explicitly provide one when calling `generate`? &square; QCheck implements many producers of pseudorandom values. Here are a few more of them: ``` module QCheck : sig ... module Gen : sig val int : int t val small_int : int t val int_range : int -> int -> int t val list : 'a t -> 'a list t val list_size : int t -> 'a t -> 'a list t val string : ?gen:char t -> string t val small_string : ?gen:char t -> string t ... end ... end ``` You can [read the documentation][qcheck.gen] of those and many others. [qcheck.gen]: http://cedeela.fr/~simon/software/qcheck/QCheck.Gen.html ##### Exercise: generate list [✭✭, optional] Use `QCheck.Gen.generate1` to generate a list whose length is between 5 and 10, and whose elements are integers between 0 and 100. Then use `QCheck.Gen.generate` to generate a 3-element list, each element of which is a list of the kind you just created with `generate1`. &square; ## Properties and arbitraries There are two more ideas we need to understand before we can get to testing with QCheck: properties and arbitraries. **Properties.** It's tempting to think that QCheck would enable us to test a function by generating many pseudorandom inputs to the function, running the function on them, then checking that the outputs are correct. But there's immediately a problem: how can QCheck know what the correct output is for each of those inputs? Since they're randomly generated, the test engineer can't hardcode the right outputs, as we've usually been doing with OUnit test suites this semester. So instead, QCheck allows us to check whether a *property* of each output holds. A property is a function of type `t -> bool`, for some type `t`, that tells use whether the value of type `t` exhibits some desired characteristic. Here, for example, here are two properties; one that determines whether an integer is even, and another that determines whether a list is sorted in non-decreasing order according to the built-in `<=` operator: ``` let is_even n = n mod 2 = 0 let rec is_sorted = function | [] -> true | [h] -> true | h1::(h2::t as t') -> h1 <= h2 && is_sorted t' ``` **Arbitraries.** The way we present to QCheck the outputs to be checked is with a value of type `'a QCheck.Arbitrary`. This type represents an "arbitrary" value of type `'a`—that is, it has been pseudorandomly chosen as a value that we want to check, and more specifically, to check whether it satisfies a property. We can create *arbitraries* (as we'll call them) out of generators using the function `QCheck.make : 'a QCheck.Gen.t -> 'a QCheck.arbitrary`. (Actually that function takes some optional arguments that we elide here.) This isn't actually the normal way to create arbitraries, but it's a simple way that will help us understand them; we'll get to the normal way in a little while. For example, the following expression represents an arbitrary integer: ``` QCheck.make QCheck.Gen.int ``` You can [read the documentation][qcheck] of `QCheck` and its arbitraries. [qcheck]: http://cedeela.fr/~simon/software/qcheck/QCheck.html ##### Exercise: arbitrary list [✭✭, optional] Use `QCheck.make` and part of your solution to **generate list** from above to create an arbitrary that represents a list whose length is between 5 and 10, and whose elements are integers between 0 and 100. The type of your arbitrary should be `int list QCheck.arbitrary`. &square; ## Testing properties with QCheck To construct a QCheck test, we create an arbitrary and a property, and pass them to `QCheck.Test.make : 'a QCheck.arbitrary -> ('a -> bool) -> QCheck.Test.t`. (That function also takes some optional arguments that we elide here.) The test will generate some number of arbitraries (by default, 100) and check whether the property holds of each of them. For example, the following code creates a QCheck test that checks whether an arbitrary integer is even; the probability that this test will pass is only \$2^{-100}\$: ``` let t = QCheck.Test.make (QCheck.make QCheck.Gen.int) is_even ``` If we want to change the number of arbitraries that are checked, we can pass an optional integer argument `~count` to `QCheck.Test.make`. We can run that test with `QCheck_runner.run_tests : QCheck.Test.t list -> int`. (Once more, that function takes some optional arguments that we elide here.) The integer it returns is 0 if all the tests in the list pass, and 1 otherwise. For the test above, running it will output 1 with high probability, because it will generate at least one odd integer. The output would look like the following: ``` # QCheck_runner.run_tests [t];; test `<test>` failed on >= 1 cases: <instance> failure (1 tests failed, ran 1 tests) - : int = 1 ``` Unfortunately, that output isn't very informative; it doesn't tell us what particular values failed to satisfy the property! We'll fix that problem in a little while. ##### Exercise: even arbitrary list [✭✭, optional] Use your solution to **arbitrary list** from above to create and run a QCheck test that checks whether at least one element of an arbitrary list (of 5 to 10 elements, each between 0 and 100) is even. You'll need to "upgrade" the `is_even` property to work on a list of integers rather than a single integer. Each time you run the test, recall that it will generate 100 lists and check the property of them. If you run the test many times, you'll likely see some successes and some failures. &square; If you want to make an OCaml program that runs QCheck tests and prints the results, there is a function `QCheck_runner.run_tests_main` that works much like `OUnit2.run_test_tt_main`: just invoke it as the final expression in a test file. For example: ``` let tests = (* code that constructs a [QCheck.Test.t list] *) let _ = QCheck_runner.run_tests_main tests ``` If that code is in a file `test_x.ml`, you can compile and run it as follows: ``` $ ocamlbuild -pkg qcheck test_x.byte $ ./test_x.byte ``` ##### Exercise: even arbitrary list QCheck test driver [✭✭, optional] Transform your solution to **even arbitrary list** from above to a file `test_list.ml` that, when compiled an executed from the command line, runs that test and prints the result. &square; QCheck tests can be converted to OUnit tests and included in the usual kind of OUnit test suite we've been writing all along. The function that does this is `QCheck_runner.to_ounit2_test : QCheck.Test.t -> OUnit2.test`. ##### Exercise: even arbitrary list OUnit test driver [✭✭, optional] Convert your test driver `test_list.ml` from using the QCheck runner to using the OUnit test runner (that is, the final line of the file should invoke `OUnit2.run_test_tt_main`). &square; ## Making QCheck output more informative We noted above that the output of QCheck so far has told us only *whether* some arbitraries satisfied a property, but not *which* arbitraries failed to satisfy it. Let's fix that problem. The issue is with how we constructed an arbitrary directly out of a generator. An arbitrary is properly more than just a generator. The QCheck library needs to know how to print values of the generator, and a few other things as well. You can see that in the definition of `'a QCheck.arbitrary`: ``` module QCheck : ... sig type 'a arbitrary = { gen : 'a QCheck.Gen.t; print : ('a -> string) option; small : ('a -> int) option; shrink : 'a QCheck.Shrink.t option; collect : ('a -> string) option; } ... end ``` In addition to the generator field `gen`, there is a field containing an optional function to print values from the generator, and a few other optional fields as well. Luckily, we don't usually have to find a way to complete those fields ourselves; the `QCheck` module provides many arbitraries that correspond to the generators found in `QCheck.Gen`: ``` module QCheck : sig ... val int : int arbitrary val small_int : int arbitrary val int_range : int -> int -> int arbitrary val list : 'a arbitrary -> 'a list arbitrary val list_of_size : int Gen.t -> 'a arbitrary -> 'a list arbitrary val string : string arbitrary val small_string : string arbitrary ... end ``` Using those arbitraries, we can get improved error messages: ``` # let t = QCheck.Test.make QCheck.int is_even;; # QCheck_runner.run_tests [t];; test `<test>` failed on >= 1 cases: 2227842673298200061 failure (1 tests failed, ran 1 tests) failure (1 tests failed, ran 1 tests) - : int = 1 ``` The final piece of less-than-informative output in that message, `<test>`, is there because we haven't given the test case a name. We can do that by passing the optional argument `~name` to `QCheck.Test.make`. ##### Exercise: even arbitrary list improved [✭✭, optional] Upgrade your solution to **even arbitrary list** to use `QCheck.list_of_size` and its friends instead of `QCheck.Gen.list_size`. When finished, you'll be able to see lists that violate the property. &square; You'll likely notice after finishing that exercise that there's only one list that is ever reported as violating the property, which is the empty list. That's because when QCheck finds a value that violates the property, QCheck attempts to *shrink* that value down to the smallest input it can find that also violates the property. The `shrink` field of `arbitrary` is part of that functionality. Shrinking an int involves making it closer to 0; shrinking a list involves shrinking its elements individually as well as omitting elements from the list; and so forth. ## Testing functions with QCheck So far we've used QCheck only to test whether a randomly generated value satisfies some property. We haven't tried to use that value as input to a function of interest—the function we really want to test—and see whether the function's output satisfies a property. Let's do that now. Here is a QCheck test to see whether the output of `double` is correct: ``` let double x = 2 * x let t = QCheck.Test.make QCheck.int (fun x -> let y = double x in y = 2*x) ``` Note how the property we pass to `QCheck.Test.make` takes in a value (which will be value generated by `QCheck.int`), computes a function of that value (`double x`), then checks whether that computed output satisfies a property of interest (equals `2*x`). ##### Exercise: odd_divisor [✭✭, optional] Find the file `qchecks.ml` in the source code for this lab. In it there is a function `odd_divisor`. Write a QCheck test to determine whether the output of that function (on a positive integer, per its precondition; *hint: there is an arbitrary that generates positive integers*) is both odd and is a divisor of the input. You will discover that there is a bug in the function. What is the smallest integer that triggers that bug? &square; ## Additional QCheck exercises ##### Exercise: qcheck max [✭✭✭, optional] The file `qchecks.ml` contains a function `max` that is buggy. Write a QCheck test that detects the bug. You will have to figure out how to make an arbitrary that can generate two integers as inputs. *Hint: `QCheck.pair` has type* `'a arbitrary -> 'b arbitrary -> ('a * 'b) arbitrary`. You will also have to devise an appropriate property to check. *Hint: the maximum of two numbers must be at least as big as each of them, and must be equal to one of them.* &square; ##### Exercise: qcheck avg [✭✭✭✭, optional] The file `qchecks.ml` contains a function `avg` that is buggy. Write a QCheck test that detects the bug. For the property that you check, construct your own *reference implementation* of average, such as the following: ``` let ref_avg lst = (float_of_int (List.fold_left (+) 0 lst)) /. (float_of_int (List.length lst)) ``` Compare the output of `avg` to the output of `ref_avg` to determine correctness. *Hint: this bug is harder to find and might require coming up with good inputs to check based on glass-box inspection of the source code.* &square;