# Testing
Today we will test with representation invariants, write some black box tests
against an interface, and use a library called QCheck that does randomized testing.
The latter will require us to learn a little bit about random number generation.
To get started, download [the provided code for this lab](code.zip).
## Representation invariants
Look at `queues.ml`, which efficiently implements queues with two lists.
It's mostly the same implementation that we provided in the code
accompanying the lecture on modules, but we've deliberately added
a couple bugs to it.
##### Exercise: AF and RI [✭]
The `TwoListQueue` module documents an abstraction function
and a representation invariant, but they are not clearly identified
as such. Modify the comments to explicitly identify the abstraction
function and representation invariant.
□
##### Exercise: rep ok [✭✭]
Write a `rep_ok` function for `TwoListQueue`. Its type should be `'a t
-> 'a t`. It should raise an exception whenever the representation
invariant does not hold. Modify the other functions exposed by the
`Queue` signature to (i) check that `rep_ok` holds for any queues passed
in, and (ii) check that `rep_ok` also holds for any queues passed out.
*Hint: you will need to add nine applications of `rep_ok`.*
□
##### Exercise: test with rep ok [✭✭✭]
There are two bugs we deliberately injected into `TwoListQueue`. Both
are places where we failed to apply `norm` to ensure that a queue is in
normal form. Figure out where those are by testing each operation of
`TwoListQueue` in the toplevel to see where your `rep_ok` raises an
exception. Fix each bug by adding an application of `norm`.
*Hint: to find one of the bugs, you will need to build a queue of
length at least 2.*
□
## Black-box testing
Find `sets.ml` in the code for this lab. It implements sets with a list.
Read the `Set` signature at the top of it; **do not** read down
to the end of the file where that signature is implemented
as `ListSet`:
```
module ListSet : Set = struct ... end
```
##### Exercise: black box test [✭✭]
Based on the specification comments of `Set`, write an OUnit test suite
for `ListSet` that does black-box testing of `size` and `insert`. We've
already got you started in the provided file `test_sets.ml`. Write enough
tests to detect at least one bug in both functions without ever reading
their implementations. *Hint: `empty` and `to_list` are both correctly
implemented, so you can rely on those.*
□
## Randomized testing
In lecture, we mentioned randomized testing as a means of finding bugs
without having to invent a lot of test cases yourself. Here we'll use
an OCaml library for randomized testing called QCheck.
*Note: the rest of this lab is relatively long and involved; you might
not get all the way through it in 50 minutes.*
**Installing QCheck.**
To get started, we need to do some OPAM package management. Run these commands:
```
opam uninstall qcheck
opam install qtest
```
(The reason we need to do that is a little bit confusing. QCheck used
to be its own OPAM package, and an old version of it still exists
in OPAM. The OCaml install instructions we released at the beginning of the
semester inadvertently had you install that old version, but we really
want the new version. The new version has been folded into a larger
package called QTest. So first we have to uninstall the old package,
then we have to install the new one.)
Now launch utop and make sure the following directives succeed:
```
# #require "qcheck";;
# #show QCheck.Gen;;
module Gen :
sig
...
end
```
If the `#show` directive instead responds with `Unknown element` then
you have not successfully upgraded to the newer version of QCheck.
Before continuing with QCheck, we need to take a brief detour into
the issue of random number generation...
## Random number generation
Most languages provide the facility to generate random numbers. In
truth, these generators are usually not truly random (in the sense that
they are completely unpredictable) but in fact are
[*pseudorandom*][prng]: the sequence of numbers they generate pass good
statistical tests to ensure there is no discernible pattern in them, but the
sequence itself is a deterministic function of an initial *seed* value.
(Recall that the prefix *pseudo* is from the Greek *pseudēs* meaning "false".)
[Java][java-random] and [Python][python-random] both provide
pseudorandom number generators (PRNGs). So does OCaml in the standard library's
[`Random` module][random].
[prng]: https://en.wikipedia.org/wiki/Pseudorandom_number_generator
[java-random]: https://docs.oracle.com/javase/8/docs/api/java/util/Random.html
[python-random]: https://docs.python.org/3/library/random.html
[random]: http://caml.inria.fr/pub/docs/manual-ocaml/libref/Random.html
##### Exercise: pseudorandom [✭]
Start a new session of utop. Enter the following:
```
# Random.int 100;;
# Random.int 100;;
# Random.int 100;;
```
Write down the responses that you get. Each is a pseudorandom integer \\(i\\) such that
\\(0 \leq i \lt 100\\).
Now quit utop and start another new session. Enter the same phrases as above again.
You will get the same responses as last time. Unless your OCaml installation is
different from the VM's, they will be: 44, 85, 82. Chances are that everyone in
your recitation will get those same numbers. Not exactly unpredictable, eh?
□
Although for purposes of security and cryptography a PRNG leads to terrible vulnerabilities,
for other purposes—including testing and simulation—PRNGs are just fine.
In fact their predictability can even be useful: given the same initial seed, a PRNG
will always produce the same sequence of pseudorandom numbers, leading to the ability
to repeat a particular sequence of tests or a particular simulation.
The way a PRNG works in general is that it initializes a *state* that it keeps internally
from the initial seed. From then on, each time the PRNG generates a new value, it
imperatively updates that state. The `Random` module in fact makes it possible to
manipulate that state in limited ways.
##### Exercise: PRNG state [✭]
Start a new session of utop. Enter the following:
```
# let s1 = Random.get_state ();;
val s1 : Random.State.t =
# let s2 = Random.State.copy s1;;
val s2 : Random.State.t =
```
Now let's use a different function, `Random.State.int`. It is like
`Random.int`, except that it explicitly accepts a state argument instead
of using the default state encapsulated in the `Random` structure:
```
# Random.State.int s1 100;;
# Random.State.int s1 100;;
# Random.State.int s1 100;;
# Random.State.int s2 100;;
# Random.State.int s2 100;;
# Random.State.int s2 100;;
```
What do you notice about the sequence of values that the six expressions above produces?
□
If you want a less predictable sequence of values, then you can change
the seed. The functions `Random.self_init` and
`Random.State.make_self_init` will choose a "random" seed to initialize
the state. They do so by sampling from a special Unix file named
[`/dev/urandom`][urandom], which is meant to provide as close to true
randomness as a computer can.
[urandom]: https://en.wikipedia.org/wiki//dev/random
##### Exercise: random seed [✭]
Start a new session of utop. Enter the following:
```
# Random.self_init ();;
# Random.int 100;;
# Random.int 100;;
# Random.int 100;;
```
Now repeat that a second time (it doesn't matter whether you exit utop
or not in between). What do you notice about the sequence of values
that is produced?
□
## QCheck pseudorandom value generation
One of the key pieces of functionality provided by QCheck is the ability
to generate pseudorandom values of various types. Here is some of the
signature of the module that does that:
```
module QCheck : sig
...
module Gen :
sig
type 'a t = Random.State.t -> 'a
val int : int t
val generate : ?rand:Random.State.t -> n:int -> 'a t -> 'a list
val generate1 : ?rand:Random.State.t -> 'a t -> 'a
...
end
...
end
```
An `'a QCheck.Gen.t` is a function that takes in a PRNG state and uses
it to produce a pseudorandom value of type `'a`. So `QCheck.Gen.int`
produces pseudorandom integers. The function `generate1`
actually does the generation of one pseudorandom value. It takes an
optional argument that is a PRNG state; if that argument is not
supplied, it uses the default PRNG state. The function `generate`
produces a list of `n` pseduorandom values.
##### Exercise: generate [✭]
Start a new session of utop. Enter the following:
```
# #require "qcheck";;
# QCheck.Gen.(generate 3 int);;
```
Now exit utop, and repeat the above. What do you notice about the sequence of values
that is produced? What do you infer about how QCheck seeds the PRNG if you don't
explicitly provide one when calling `generate`?
□
QCheck implements many producers of pseudorandom values. Here are a few more of them:
```
module QCheck : sig
...
module Gen :
sig
val int : int t
val small_int : int t
val int_range : int -> int -> int t
val list : 'a t -> 'a list t
val list_size : int t -> 'a t -> 'a list t
val string : ?gen:char t -> string t
val small_string : ?gen:char t -> string t
...
end
...
end
```
You can [read the documentation][qcheck.gen] of those and many others.
[qcheck.gen]: http://cedeela.fr/~simon/software/qcheck/QCheck.Gen.html
##### Exercise: generate list [✭✭]
Use `QCheck.Gen.generate1` to generate a list whose length is between 5 and 10, and whose
elements are integers between 0 and 100. Then use `QCheck.Gen.generate` to generate
a 3-element list, each element of which is a list of the kind you just created with
`generate1`.
□
## Properties and arbitraries
There are two more ideas we need to understand before we can get to testing with
QCheck: properties and arbitraries.
**Properties.**
It's tempting to think that QCheck would enable us to test a function by
generating many pseudorandom inputs to the function, running the
function on them, then checking that the outputs are correct. But
there's immediately a problem: how can QCheck know what the correct
output is for each of those inputs? Since they're randomly generated,
the test engineer can't hardcode the right outputs, as we've usually
been doing with OUnit test suites this semester.
So instead, QCheck allows us to check whether a *property* of each
output holds. A property is a function of type `t -> bool`, for some
type `t`, that tells use whether the value of type `t` exhibits some
desired characteristic. Here, for example, here are two properties; one
that determines whether an integer is even, and another that determines
whether a list is sorted in non-decreasing order according to the
built-in `<=` operator:
```
let is_even n =
n mod 2 = 0
let rec is_sorted = function
| [] -> true
| [h] -> true
| h1::(h2::t as t') -> h1 <= h2 && is_sorted t'
```
**Arbitraries.**
The way we present to QCheck the outputs to be checked is with a value
of type `'a QCheck.Arbitrary`. This type represents an "arbitrary"
value of type `'a`—that is, it has been pseudorandomly chosen as a
value that we want to check, and more specifically, to check whether it
satisfies a property.
We can create *arbitraries* (as we'll call them) out of generators using
the function `QCheck.make : 'a QCheck.Gen.t -> 'a QCheck.arbitrary`.
(Actually that function takes some optional arguments that we elide
here.) This isn't actually the normal way to create arbitraries, but
it's a simple way that will help us understand them; we'll get to the
normal way in a little while. For example, the following expression
represents an arbitrary integer:
```
QCheck.make QCheck.Gen.int
```
You can [read the documentation][qcheck] of `QCheck` and its arbitraries.
[qcheck]: http://cedeela.fr/~simon/software/qcheck/QCheck.html
##### Exercise: arbitrary list [✭✭]
Use `QCheck.make` and part of your solution to **generate list** from
above to create an arbitrary that represents a list whose length is
between 5 and 10, and whose elements are integers between 0 and 100.
The type of your arbitrary should be `int list QCheck.arbitrary`.
□
## Testing properties with QCheck
To construct a QCheck test, we create an arbitrary and a property, and
pass them to `QCheck.Test.make : 'a QCheck.arbitrary -> ('a -> bool) ->
QCheck.Test.t`. (That function also takes some optional arguments that
we elide here.) The test will generate some number of arbitraries (by
default, 100) and check whether the property holds of each of them. For
example, the following code creates a QCheck test that checks whether an
arbitrary integer is even; the probability that this test will pass
is only \\(2^{-100}\\):
```
let t = QCheck.Test.make (QCheck.make QCheck.Gen.int) is_even
```
If we want to change the number of arbitraries that are checked, we can
pass an optional integer argument `~count` to `QCheck.Test.make`.
We can run that test with `QCheck_runner.run_tests : QCheck.Test.t list -> int`.
(Once more, that function takes some optional arguments that we elide here.)
The integer it returns is 0 if all the tests in the list pass, and 1 otherwise.
For the test above, running it will output 1 with high probability,
because it will generate at least one odd integer. The output would look
like the following:
```
# QCheck_runner.run_tests [t];;
test `` failed on >= 1 cases: failure (1 tests failed, ran 1 tests)
- : int = 1
```
Unfortunately, that output isn't very informative; it doesn't tell us what particular
values failed to satisfy the property! We'll fix that problem in a little while.
##### Exercise: even arbitrary list [✭✭]
Use your solution to **arbitrary list** from above to create and run a
QCheck test that checks whether at least one element of an arbitrary
list (of 5 to 10 elements, each between 0 and 100) is even. You'll need
to "upgrade" the `is_even` property to work on a list of integers rather
than a single integer.
Each time you run the test, recall that it will generate 100 lists and
check the property of them. If you run the test many times, you'll
likely see some successes and some failures.
□
If you want to make an OCaml program that runs QCheck tests and prints
the results, there is a function `QCheck_runner.run_tests_main` that works
much like `OUnit2.run_test_tt_main`: just invoke it as the final
expression in a test file. For example:
```
let tests = (* code that constructs a [QCheck.Test.t list] *)
let _ = QCheck_runner.run_tests_main tests
```
If that code is in a file `test_x.ml`, you can compile and run it as follows:
```
$ ocamlbuild -pkg qcheck test_x.byte
$ ./test_x.byte
```
##### Exercise: even arbitrary list QCheck test driver [✭✭]
Transform your solution to **even arbitrary list** from above to a file
`test_list.ml` that, when compiled an executed from the command line,
runs that test and prints the result.
□
QCheck tests can be converted to OUnit tests and included in the usual kind
of OUnit test suite we've been writing all along. The function
that does this is `QCheck_runner.to_ounit2_test : QCheck.Test.t -> OUnit2.test`.
##### Exercise: even arbitrary list OUnit test driver [✭✭]
Convert your test driver `test_list.ml` from using the QCheck runner to using
the OUnit test runner (that is, the final line of the file should
invoke `OUnit2.run_test_tt_main`).
□
## Making QCheck output more informative
We noted above that the output of QCheck so far has told us only *whether*
some arbitraries satisfied a property, but not *which* arbitraries failed
to satisfy it. Let's fix that problem.
The issue is with how we constructed an arbitrary directly out of a generator.
An arbitrary is properly more than just a generator. The QCheck library needs
to know how to print values of the generator, and a few other things as well.
You can see that in the definition of `'a QCheck.arbitrary`:
```
module QCheck :
...
sig
type 'a arbitrary = {
gen : 'a QCheck.Gen.t;
print : ('a -> string) option;
small : ('a -> int) option;
shrink : 'a QCheck.Shrink.t option;
collect : ('a -> string) option;
}
...
end
```
In addition to the generator field `gen`, there is a field containing an optional
function to print values from the generator, and a few other optional fields as well.
Luckily, we don't usually have to find a way to complete those fields ourselves;
the `QCheck` module provides many arbitraries that correspond to the generators
found in `QCheck.Gen`:
```
module QCheck :
sig
...
val int : int arbitrary
val small_int : int arbitrary
val int_range : int -> int -> int arbitrary
val list : 'a arbitrary -> 'a list arbitrary
val list_of_size : int Gen.t -> 'a arbitrary -> 'a list arbitrary
val string : string arbitrary
val small_string : string arbitrary
...
end
```
Using those arbitraries, we can get improved error messages:
```
# let t = QCheck.Test.make QCheck.int is_even;;
# QCheck_runner.run_tests [t];;
test `` failed on >= 1 cases: 2227842673298200061
failure (1 tests failed, ran 1 tests) failure (1 tests failed, ran 1 tests)
- : int = 1
```
The final piece of less-than-informative output in that message, ``,
is there because we haven't given the test case a name. We can do that
by passing the optional argument `~name` to `QCheck.Test.make`.
##### Exercise: even arbitrary list improved [✭✭]
Upgrade your solution to **even arbitrary list** to use `QCheck.list_of_size`
and its friends instead of `QCheck.Gen.list_size`. When finished, you'll be
able to see lists that violate the property.
□
You'll likely notice after finishing that exercise that there's only one
list that is ever reported as violating the property, which is the empty
list. That's because when QCheck finds a value that violates the
property, QCheck attempts to *shrink* that value down to the smallest
input it can find that also violates the property. The `shrink` field of
`arbitrary` is part of that functionality. Shrinking an int involves
making it closer to 0; shrinking a list involves shrinking its elements
individually as well as omitting elements from the list; and so forth.
## Testing functions with QCheck
So far we've used QCheck only to test whether a randomly generated value
satisfies some property. We haven't tried to use that value as input
to a function of interest—the function we really want to test—and
see whether the function's output satisfies a property. Let's do that now.
Here is a QCheck test to see whether the output of `double` is correct:
```
let double x = 2 * x
let t = QCheck.Test.make QCheck.int
(fun x ->
let y = double x
in y = 2*x)
```
Note how the property we pass to `QCheck.Test.make` takes in a value (which
will be value generated by `QCheck.int`), computes a function of that value
(`double x`), then checks whether that computed output satisfies a property
of interest (equals `2*x`).
##### Exercise: odd_divisor [✭✭]
Find the file `qchecks.ml` in the source code for this lab. In it there
is a function `odd_divisor`. Write a QCheck test to determine whether
the output of that function (on a positive integer, per its
precondition; *hint: there is an arbitrary that generates positive
integers*) is both odd and is a divisor of the input. You will discover
that there is a bug in the function. What is the smallest integer that
triggers that bug?
□
## Additional exercises
##### Exercise: black box test rest [✭✭✭]
Finish writing a black-box OUnit test suite for `ListSet`. Find at
least one bug in every function except `empty` and `to_list`.
□
##### Exercise: fix ListSet [✭✭✭]
After your test suite is finished, go read the implementation of
`ListSet`. Fix all the bugs in it and make your test suite pass.
□
##### Exercise: qcheck max [✭✭✭]
The file `qchecks.ml` contains a function `max` that is buggy. Write a
QCheck test that detects the bug. You will have to figure out how to
make an arbitrary that can generate two integers as inputs. *Hint:
`QCheck.pair` has type*
`'a arbitrary -> 'b arbitrary -> ('a * 'b) arbitrary`.
You will also have to devise an appropriate property to check. *Hint:
the maximum of two numbers must be at least as big as each of them, and
must be equal to one of them.*
□
##### Exercise: qcheck avg [✭✭✭]
The file `qchecks.ml` contains a function `avg` that is buggy. Write a
QCheck test that detects the bug. For the property that you check,
construct your own *reference implementation* of average, such as the
following:
```
let ref_avg lst =
(float_of_int (List.fold_left (+) 0 lst))
/. (float_of_int (List.length lst))
```
Compare the output of `avg` to the output of `ref_avg` to determine correctness.
*Hint: this bug is harder to find and might require coming up with good
inputs to check based on glass-box inspection of the source code.*
□