(**
* Verification in Coq
-----
#*#
Topics:
- verification of functions
- extraction of OCaml code
- verification of data structures
- verification of compilers
#*#
-----
*)
Require Import List Arith Bool.
Import ListNotations.
(**
(**********************************************************************)
** Verification of Functions
A function is correct if it satisfies its specification. So to verify
a function in Coq, we need to
- code the function,
- state a theorem that says that function satisfies its specification, and
- prove the theorem.
*** Verifying Factorial
Let's try that with the factorial function. Here's an implementation
of it in Coq:
*)
Fixpoint fact (n : nat) :=
match n with
| 0 => 1
| S k => n * (fact k)
end.
(**
As we learned before, the function has to pattern match against
[n] and recursively call itself on [k] to demonstrate to Coq that
the recursive call will eventually terminate.
What would be a reasonable specification for [fact]? If we were
just going to document it in a comment, we might write
something like this:
<<
(** [fact n] is [n] factorial, i.e., [n!].
Requires: [n >= 0]. *)
>>
In OCaml, that precondition would be necessary. In Coq, since we
are computing on natural numbers, it would be redundant.
But how can we formally state in Coq that [fact n] is [n] factorial?
There is no factorial operator in most programming languages, including
Coq. So we can't just write something like the following:
<<
Theorem fact_correct : forall (n : nat), fact n = n!.
>>
Instead, we need another way to express [n!].
Whenever we want to define the meaning of an operator for use in a
logic, we need to write down _axioms_ and _inference rules_ for it.
We've already seen that in two ways:
- With logical connectives, like [/\], we saw that axioms and inference
rules could define how to introduce and eliminate connectives. For
example, from a proof of [A /\ B], we could conclude [A]. Hence
[A /\ B -> A].
- With rings and fields, we saw how axioms (we didn't need inference
rules) could define equalities involving operators. For example,
[0 * x = 0] allowed us to replace any multiplication by [0] simply
with [0] itself.
So, let's define the factorial operator in a similar way:
- [0! = 1].
- If [a! = b] then [(a+1)! = (a+1)*b].
The first line, which is an axiom, defines how the factorial
operator behaves when applied to zero. The second line, which
is an inference rule, defines hwo the operator behaves when
applied to a successor of a natural number.
Another way to think about that definition is that it defines
a relation. Call it the "factorial of" relation:
- The factorial of [0] is [1].
- If the factorial of [a] is [b], then the factorial of [a + 1] is
[a + 1] times [b].
Together, the axiom and inference rule give us a way to "grow"
the relation. We start from a "seed", which is the axiom:
we know that the factorial of [0] is [1]. From there we can
apply the inference rule, and conclude that the factorial of
[0+1] is [0 + 1] times [1], i.e., that the factorial of [1] is
[1]. We can keep doing that with the inference rule to determine
the factorial of any number.
Let's code up that relation in Coq. We're going
to define a proposition [factorial_of] that is parameterized on two
natural numbers, [a] and [b]. We want [factorial_of a b] to be a provable
proposition whenever [a! = b].
*)
Inductive factorial_of : nat -> nat -> Prop :=
| factorial_of_zero : factorial_of 0 1
| factorial_of_succ : forall (a b : nat),
factorial_of a b -> factorial_of (S a) ((S a) * b).
(**
This definition resembles the definition of an inductive type, which
we've done before. But here we are inductively defining a proposition.
That proposition, [factorial_of], is parameterized on two natural
numbers. There are two ways to construct an instance of this
parameterized proposition. The first is to use the [factorial_of_zero]
constructor, which corresponds to the axiom we talked about above.
The second is the [factorial_of_succ] constructor, which corresponds to the
inference rule.
Another way to think about this definition is in terms of evidence. The
[factorial_of_zero] constructor provides (by definition) the evidence
that the factorial of [0] is [1]. The [factorial_of_succ] constructor
provides (again by definition) a way of tranforming evidence that
the factorial of [a] is [b] into evidence that the factorial of [S a] is
[(S a) * b].
Now that we have a formalization of the factorial operation, we can
state a theorem that says [fact] satisfies its specification:
*)
Theorem fact_correct : forall (n : nat),
factorial_of n (fact n).
(**
In other words, the factorial of [n] is the same value that [fact n] computes.
So [fact] is computing the correct function. Note that we don't have
to mention the precondition because of the type of [n].
To prove the theorem, we'll need induction.
*)
Proof.
intros n.
induction n as [ | k IH].
- simpl. apply factorial_of_zero.
- simpl. apply factorial_of_succ. assumption.
Qed.
(**
That concludes our verification of [fact]: we coded it in Coq, wrote
a specification for it in Coq, and proved that it satisfies its specification.
*** A Reflection on Formalization
If you stop to reflect on what we just did, it has the potential to seem
unsatisfying. The skeptic might exclaim, "All you did was say the same thing
twice! You coded up [fact] once as a Coq program, a second time as a Coq
proposition, and proved that the two are the same. Isn't that rather
trivial and obvious?"
As a response, first, note that we did this verification for a very simple
function. It shouldn't be surprising that the formalization of a simple
function ends up looking relatively redundant with respect to the program
that computes the function.
Second, note that technically the skeptic is wrong: we didn't
say the _same_ thing twice. We expressed the idea of the factorial operation
in two subtly different ways. The first way, [fact], specifies a computation
that takes a (potentially large) natural number and continues to recurse on
smaller and smaller numbers until it reaches a case case. The second way,
[factorial_of], specifies a mathematical relation that starts with the
base case of [0] and can build up from there to reach larger numbers.
A lot of formal verification has that flavor: express a computation, express
a mathematical formalization of the computation, then prove that the two
are the same. Or, prove that the two are _similar enough_: often, the
exact details of the computation are irrelevant to the mathematical
formalization. It doesn't typically matter, for example, which order
the sides of a binary operator are evaluated in, so even though the computation
might be explicit, the mathematical formalization need not be. (Side effects
would, of course, complicate that analysis.)
Testing and verification are alike in that sense of potential redundancy.
With testing, you write down information---inputs and outputs---that you
hope is redundant, because the program already encodes the algorithm required
to transform those inputs into those outputs. It's only when you are
surprised, i.e., the test case fails to agree with the program, that you
appreciate the value of saying things twice. By saying the same thing twice,
but differently, you make it more likely to expose any errors because you
_detect the inconsistency_.
*** Verifying Tail-Recursive Factorial
Next, let's verify a different implementation of the factorial operation.
This is the tail-recursive implementation. As we learned much earlier,
this implementation is more space efficient than the naive recursive
implementation.
*)
Fixpoint fact_tr_acc (n : nat) (acc : nat) :=
match n with
| 0 => acc
| S k => fact_tr_acc k (n * acc)
end.
Definition fact_tr (n : nat) :=
fact_tr_acc n 1.
(**
To verify the correctness of [fact_tr], we'll prove the same kind of
theorem as we did for [fact]. For the most part, the proof proceeds
easily:
*)
Theorem fact_tr_correct : forall (n : nat),
factorial_of n (fact_tr n).
Proof.
intros n. unfold fact_tr.
induction n as [ | k IH].
- simpl. apply factorial_of_zero.
- simpl.
(**
At this point, we have a [k * 1] that we'd like to simplify to
just [k]. There's already a library theorem that can do the job
for us:
*)
Check mult_1_r. (* forall n : nat, n * 1 = n *)
(** We continue the proof using it: *)
rewrite mult_1_r.
destruct k as [ | m].
-- simpl. rewrite <- mult_1_r.
apply factorial_of_succ. apply factorial_of_zero.
--
(**
At this point we'd like to apply [factorial_of_succ], but
we're stuck: the goal doesn't have the right shape,
because the second argument to [fact_tr_acc] is not [1],
and there is no multiplication. We'd like to replace
[fact_tr_acc (S m) (S (S m))] with [(S (S m)) * fact_tr_acc (S m) 1].
Let's abort the current proof, and factor out a helper lemma for that purpose.
*)
Abort.
(**
Nothing about the lemma we just realized we needed is actually specific
to [S (S m)]: that expression might as well be any natural number, because
[fact_tr_acc] just uses it as the base value of the accumulator. So we
can state and prove a slightly more general lemma:
*)
Lemma fact_tr_acc_mult : forall (n m : nat),
fact_tr_acc n m = m * fact_tr_acc n 1.
(**
The proof starts off relatively easy. Just before we get to the point
of using the inductive hypothesis, we'll use a new tactic, [replace],
which replaces one expression with another, and generates a new subgoal
requiring us to prove that the two expressions are in fact equal.
*)
Proof.
intros n m. induction n as [ | k IH].
- simpl. ring.
- replace (fact_tr_acc (S k) m) with (fact_tr_acc k ((S k) * m)).
--
(**
Unfortunately we're now stuck and unable to use the inductive hypothesis.
The problem is that hypothesis is:
<<
IH: fact_tr_acc k m = m * fact_tr_acc k 1
>>
but the goal has the expression:
<<
fact_tr_acc k (S k * m)
>>
The left-hand side of the inductive hypothesis doesn't match that goal,
because [IH] has just [m], whereas the goal has [S k * m].
But, looking at [IH], there does seem to be hope. There's no reason
[IH] needs to be "hard-coded" for a specific [m]. It really would
hold for any [m]. The root of the problem is that _we really want
[m] to be univerally quantified in [IH]_, but we already used [intros]
to get rid of that quantification. So, let's start over, and not be
so eager to introduce [m].
*)
Abort.
Lemma fact_tr_acc_mult : forall (n m : nat),
fact_tr_acc n m = m * fact_tr_acc n 1.
Proof.
intros n.
induction n as [ | k IH].
- intros p. simpl. ring.
- intros p.
replace (fact_tr_acc (S k) p) with (fact_tr_acc k ((S k) * p)).
--
(**
This time when we get here in the proof, the inductive hypothesis
is more general than last time:
<<
IH: forall m : nat, fact_tr_acc k m = m * fact_tr_acc k 1
>>
And that means it's applicable, letting [m] be [S k * p].
*)
rewrite IH. simpl. rewrite mult_1_r.
(**
Now we'd again like to use [IH], this time on the right-hand side,
but [rewrite IH] just causes the left-hand side to change. We can
help Coq figure out where we want to use [IH] by telling it what
we want the universally quantified [m] to be; in this case, [S k].
The syntax for that is as follows:
*)
rewrite IH with (m := S k).
(** After that, the proof is quickly finished. *)
ring.
-- simpl. trivial.
Qed.
(** Using that lemma, we can successfully verify [fact_tr]: *)
Theorem fact_tr_correct : forall (n : nat),
factorial_of n (fact_tr n).
Proof.
intros n. unfold fact_tr.
induction n as [ | k IH].
- simpl. apply factorial_of_zero.
- simpl. rewrite mult_1_r.
destruct k as [ | m].
-- simpl. rewrite <- mult_1_r.
apply factorial_of_succ. apply factorial_of_zero.
-- rewrite fact_tr_acc_mult.
apply factorial_of_succ. assumption.
Qed.
(**
Our hypothetical skeptic from before is not likely to be so skeptical
of what we did here. After all, it's not so obvious that [fact_tr]
is correct, or that it computes the [factorial_of] relation. Nonetheless,
we have successfully proved its correctness.
*)
(**
*** Another Way to Verify Tail-Recursive Factorial
Our previous two verifications of factorial have both proved that
an implementation of the factorial operation is correct. Our technique
was to state a mathematical relation describing factorial, then
prove that the implementation computed that relation.
Let's explore another technique now; a technique that can be easier to
use. Instead of using the mathematical relation, let's just prove
that the two implementations are equivalent. That is, [fact] and
[fact_tr] compute the same function.
Before launching into that proof, let's pause to ask: what would
it accomplish? The answer is that we'd be showing that a
complicated and not-obviously-correct implementation, [fact_tr],
is equivalent to a simple and more-obviously-correct implementation,
[fact]. So if we believe that [fact] is correct, we could then also
believe that [fact_tr] is correct.
This technique of proving correctness with respect to a _reference
implementation_ is quite useful. (In fact, the #
verification of the seL4 microkernel# used it to great effect.)
Without further ado, here is the theorem and its proof. It uses
a helper lemma that we'll just go ahead and state first. You'll
notice how much easier these are to prove than our previous
verification of [fact_tr]!
*)
Lemma fact_helper : forall (n acc : nat),
fact_tr_acc n acc = (fact n) * acc.
Proof.
intros n.
induction n as [ | k IH]; intros acc.
- simpl. ring.
- simpl. rewrite IH. ring.
Qed.
Theorem fact_tr_is_fact: forall n:nat,
fact_tr n = fact n.
Proof.
intros n. unfold fact_tr. rewrite fact_helper. ring.
Qed.
(** That concludes our verification of the factorial operation. *)
(**********************************************************************)
(**
** Extraction
Coq makes it possible to _extract_ OCaml code (or Haskell or Scheme) from
Coq code. That makes it possible for us to
- write Coq code,
- prove the Coq code is correct, and
- extract OCaml code that can be compiled and run more efficiently
than the original Coq code.
Let's extract [fact_tr] as an example.
*)
Require Import Extraction.
Extraction Language OCaml.
Extraction "fact.ml" fact_tr.
(**
That produces the following file:
<<
type nat =
| O
| S of nat
(** val add : nat -> nat -> nat **)
let rec add n m =
match n with
| O -> m
| S p -> S (add p m)
(** val mul : nat -> nat -> nat **)
let rec mul n m =
match n with
| O -> O
| S p -> add m (mul p m)
(** val fact_tr_acc : nat -> nat -> nat **)
let rec fact_tr_acc n acc =
match n with
| O -> acc
| S k -> fact_tr_acc k (mul n acc)
(** val fact_tr : nat -> nat **)
let fact_tr n =
fact_tr_acc n (S O)
>>
As you can see, Coq has preserved the [nat] type in this extracted
code. Unforunately, computation on natural numbers is not efficient.
(Addition requires linear time; multiplication, quadratic!)
We can direct Coq to extract its own [nat] type to OCaml's [int]
type as follows:
*)
Extract Inductive nat =>
int [ "0" "succ" ] "(fun fO fS n -> if n=0 then fO () else fS (n-1))".
Extract Inlined Constant Init.Nat.mul => "( * )".
(**
The first command says to
- use [int] instead of [nat] in the extract code,
- use [0] instead of [O] and [succ] instead of [S]
(the [succ] function is in [Pervasives] and is [fun x -> x + 1]), and
- use the provided function to emulate pattern matching over the type.
The second command says to use OCaml's integer [( * )] operator instead of
Coq's natural-number multiplication operator.
After issuing those commands, the extraction looks cleaner:
*)
Extraction "fact.ml" fact_tr.
(**
<<
(** val fact_tr_acc : int -> int -> int **)
let rec fact_tr_acc n acc =
(fun fO fS n -> if n=0 then fO () else fS (n-1))
(fun _ -> acc)
(fun k -> fact_tr_acc k (( * ) n acc))
n
(** val fact_tr : int -> int **)
let fact_tr n =
fact_tr_acc n (succ 0)
>>
There is, however, a tradeoff. The original version we extracted worked
(albeit inefficiently) for arbitrarily large numbers without any error.
But the second version is subject to integer overflow errors. So the
proofs of correctness that we did for [fact_tr] are no longer completely
applicable: they hold only up to the limits of the types we subsituted
during extraction.
Do we truly care about the limits of machine arithmetic? Maybe, maybe not.
For sake of this little example, we might not. If we were verifying
software to control the flight dynamics of a space shuttle, maybe we
would. The Coq standard library does contain a module 31-bit
integers and operators on them, which we could use if we wanted to
precisely model what would happen on a particular architecture.
*)
(**********************************************************************)
(**
** Verification of Data Structures
We've now seen how to verify individual functions. But what about
a collection of related functions, e.g., a data structure? Now we
must be concerned with not just the individual functions, but also
how they interact. For example, we expect [push] and [peek] to
interact in certain ways with a stack, or [hd] and [cons] with
a list:
- [peek (push x s) = x]
- [hd (h :: t) = h]
We can specify the behavior of a data structure by writing down
equations like those. This style of specification is called
_algebraic specification_.
When we discussed testing earlier in the semester, we categorized
the operations of a data structure whose representation type is [t] into
- creators, which create values of type [t] from scratch,
- producers, which take values of type [t] as input and return values of
type [t] as output, and
- observers, which take values of type [t] as input and return values
of some other type as output.
With algebraic specification, we want to write down equations that
characterize all the possible interactions between creators,
producers, and observers.
*** Algebraic Specification of Lists
As an example, let's write an algebraic specification of lists,
then verify the correctness of Coq's list implementation with
respect to that specification.
Our only creator will be [nil]. The producers will be [::], [++], and [tl].
The observers will be [hd] and [length]. The [hd]
operation will take an extra argument compared to OCaml's [hd] operation,
which will be a "default" value to return if the list is empty.
We could, of course, include other producers and observers in our
specification, such as [map] or [mem], but the ones we have chosen
are enough for this example.
These are the equations we expect to hold:
<<
hd x nil = x
hd _ (x::_) = x
tl nil = nil
tl (_::xs) = xs
nil ++ xs = xs
xs ++ nil = xs
(x :: xs) ++ ys = x :: (xs ++ ys)
lst1 ++ (lst2 ++ lst3) = (lst1 ++ lst2) ++ lst3
length nil = 0
length (_ :: xs) = 1 + length xs
length (xs ++ ys) = length xs + length ys
>>
Below, we state each of those equations as a theorem, and
prove the theorem. The proofs themselves do not contain
any new concepts about Coq, so we pass over them without
much comment.
*)
(** [hd x nil = x] *)
Theorem hd_nil : forall (A:Type) (x:A),
hd x nil = x.
Proof. trivial. Qed.
(** [hd _ (h :: _) = h *)
Theorem hd_cons : forall (A:Type) (x h : A) (t : list A),
hd x (h::t) = h.
Proof. trivial. Qed.
(** [tl nil = nil] *)
Theorem tl_nil : forall (A:Type),
@tl A nil = nil.
Proof. trivial. Qed.
(** [tl (_ :: xs) = xs ] *)
Theorem tl_cons : forall (A:Type) (x : A) (xs : list A),
tl (x::xs) = xs.
Proof. trivial. Qed.
(** [nil ++ xs = xs] *)
Theorem nil_app : forall (A:Type) (xs : list A),
nil ++ xs = xs.
Proof. trivial. Qed.
(** [xs ++ nil = xs] *)
Theorem app_nil : forall (A:Type) (xs : list A),
xs ++ nil = xs.
Proof.
intros A xs.
induction xs as [ | h t IH]; simpl.
- trivial.
- rewrite IH. trivial.
Qed.
(** [(x :: xs) ++ ys = x :: (xs ++ ys) *)
Theorem cons_app : forall (A:Type) (x : A) (xs ys : list A),
x::xs ++ ys = x :: (xs ++ ys).
Proof. trivial. Qed.
(** [lst1 ++ (lst2 ++ lst3) = (lst1 ++ lst2) ++ lst3] *)
Theorem app_assoc : forall (A:Type) (lst1 lst2 lst3 : list A),
lst1 ++ (lst2 ++ lst3) = (lst1 ++ lst2) ++ lst3.
Proof.
intros A lst1 lst2 lst3.
induction lst1 as [ | h t IH]; simpl.
- trivial.
- rewrite IH. trivial.
Qed.
(** [length nil = 0] *)
Theorem length_nil : forall (A:Type),
@length A nil = 0.
Proof. trivial. Qed.
(** [length (_ :: xs) = 1 + length xs] *)
Theorem length_cons : forall (A:Type) (x:A) (xs : list A),
length (x::xs) = 1 + length xs.
Proof. trivial. Qed.
(** [length (xs ++ ys) = length xs + length ys] *)
Theorem length_app : forall (A:Type) (xs ys : list A),
length (xs ++ ys) = length xs + length ys.
Proof.
intros A xs ys.
induction xs as [ | h t IH]; simpl.
- trivial.
- rewrite IH. trivial.
Qed.
(**********************************************************************)
(**
*** Algebraic Specification of Stacks
As a second example, let's specify, implement, and verify stacks.
The creator is [empty], the producers are [push] and [pop], and the
observers are [is_empty], [peek], and [size]. (You might quibble with
whether [pop] is a producer or observer; it's not really important, though.)
<<
is_empty empty = true
is_empty (push _ _) = false
peek empty = None
peek (push x _) = Some x
pop empty = None
pop (push _ s) = Some s
size empty = 0
size (push _ s) = 1 + size s
>>
*)
Module MyStack.
(** AF: We will represent a stack as a list. The head of the list
is the top of the stack. *)
Definition stack (A:Type) := list A.
Definition empty {A:Type} : stack A := nil.
Definition is_empty {A:Type} (s : stack A) : bool :=
match s with
| nil => true
| _::_ => false
end.
Definition push {A:Type} (x : A) (s : stack A) : stack A :=
x::s.
Definition peek {A:Type} (s : stack A) : option A :=
match s with
| nil => None
| x::_ => Some x
end.
Definition pop {A:Type} (s : stack A) : option (stack A) :=
match s with
| nil => None
| _::xs => Some xs
end.
Definition size {A:Type} (s : stack A) : nat :=
length s.
(**
Now that we've implemented all the stack operations,
we'll verify their correctness. All the proofs are
trivial, because the implementation is so simple.
*)
(** [is_empty empty = true] *)
Theorem empty_is_empty : forall (A:Type),
@is_empty A empty = true.
Proof. trivial. Qed.
(** [is_empty (push _ _) = false] *)
Theorem push_not_empty : forall (A:Type) (x:A) (s : stack A),
is_empty (push x s) = false.
Proof. trivial. Qed.
(** [peek empty = None] *)
Theorem peek_empty : forall (A:Type),
@peek A empty = None.
Proof. trivial. Qed.
(** [peek (push x _) = Some x] *)
Theorem peek_push : forall (A:Type) (x:A) (s : stack A),
peek (push x s) = Some x.
Proof. trivial. Qed.
(** [pop empty = None] *)
Theorem pop_empty : forall (A:Type),
@pop A empty = None.
Proof. trivial. Qed.
(** [pop (push _ s) = Some s] *)
Theorem pop_push : forall (A:Type) (x:A) (s : stack A),
pop (push x s) = Some s.
Proof. trivial. Qed.
(** [size empty = 0] *)
Theorem size_empty : forall (A:Type),
@size A empty = 0.
Proof. trivial. Qed.
(** [size (push x s) = 1 + size s] *)
Theorem size_push : forall (A:Type) (x:A) (s : stack A),
size(push x s) = 1 + size s.
Proof. trivial. Qed.
End MyStack.
(**
To extract our stack implementation to OCaml, it will help
to additional declare to Coq that we want to extract its
booleans, options, and lists to OCaml's own built-in types
for those.
*)
Extract Inductive bool => "bool" [ "true" "false" ].
Extract Inductive option => "option" [ "Some" "None" ].
Extract Inductive list => "list" [ "[]" "(::)" ].
Extract Inlined Constant length => "List.length".
Extraction "mystack.ml" MyStack.
(**********************************************************************)
(**
** Verification of a Compiler
One of the big success stories of Coq verification is the #CompCert C compiler#. Its source
language is ISO C99. It is an optimizing compiler that targets
PowerPC, ARM, RISC-V, and x86 processors. The correctness proofs
establish that the executable code it produces will behave exactly
as it should according to the semantics of the C source code.
Let's get a sense of what would be required to verify a compiler.
We'll take a tiny source language, compile it into a tiny bytecode
language, and verify the correctness of that compilation. We'll
only worry here about the backend of the compiler, not about
the frontend (including parsing). CompCert originally only
was a verified backend, too, but in the last few years even
the front end has been verified.
*)
Module Compiler.
(**
As the source language, we'll use arithmetic expressions that
have only integer constants and addition:
<<
e ::= i | e + e
>>
In OCaml, we could represent that with this AST type:
<<
type expr =
| Const of int
| Plus of expr * expr
>>
In Coq, the type is very similar, though we'll use [nat] instead of [int]:
*)
Inductive expr : Type :=
| Const : nat -> expr
| Plus : expr -> expr -> expr.
(**
The _dynamic semantics_ of expressions is straightforward:
<<
i ==> i
e1 + e2 ==> i
if e1 ==> i1
and e2 ==> i2
and i = i1 + i2
>>
And it's easily implementable. Here's a big-step interpreter:
*)
Fixpoint eval_expr (e : expr) : nat :=
match e with
| Const i => i
| Plus e1 e2 => (eval_expr e1) + (eval_expr e2)
end.
(** Here are a couple test cases for our interpreter: *)
Example source_test_1 : eval_expr (Const 42) = 42.
Proof. trivial. Qed.
Example source_test_2 : eval_expr (Plus (Const 2) (Const 2)) = 4.
Proof. trivial. Qed.
(**
As a _target language_, let's use something similar to
what Java and OCaml use for bytecode. They are based on
a _stack machine_ model, in which bytecode instructions
manipulate a stack. Our tiny little bytecode language
will have the following instruction set:
<<
instr ::= PUSH i | ADD
>>
A program is just a sequence of instructions.
For example, the following program pushes [2] on the stack,
pushes [2] again, then adds the two values on the stack.
Adding causes two values to be popped, and the sum pushed
back onto the stack.
<<
PUSH 2
PUSH 2
ADD
>>
We'll implement this stack language in Coq as follows.
An [instr] is a machine instruction. A program [prog]
is a list of instructions. *)
Inductive instr : Type :=
| PUSH : nat -> instr
| ADD : instr.
Definition prog := list instr.
(**
Now we can write an interpreter for the target language.
Evaluation of a program takes in an initial stack,
and returns the final stack. But since evaluation
could fail (if we try to ADD when there aren't at
least two values on the stack), we wrap the return
in an option, and return None if an error occurs.
*)
Definition stack := list nat.
Fixpoint eval_prog (p : prog) (s : stack) : option stack :=
match p,s with
| PUSH n :: p', s => eval_prog p' (n :: s)
| ADD :: p', x :: y :: s' => eval_prog p' (x + y :: s')
| nil, s => Some s
| _, _ => None
end.
(** Here are a couple unit tests for the target language interpreter. *)
Example target_test_1 : eval_prog [PUSH 42] [] = Some [42].
Proof. trivial. Qed.
Example target_test_2 : eval_prog [PUSH 2; PUSH 2; ADD] [] = Some [4].
Proof. trivial. Qed.
(**
Now we're ready to translate from the source language to the
target language.
- To translate a constant [c], we just push [c] onto the stack.
- To translate an addition [e1 + e2], we translate [e2], translate [e1],
then append the instructions together, followed by an [ADD]
instruction.
The function below, [compile e], produces a program [p], such that
evaluation of [p] leaves a single new value at the top
of the stack, and that value would be the result of
evaluating [e].
*)
Fixpoint compile (e : expr) : prog :=
match e with
| Const n => [PUSH n]
| Plus e1 e2 => compile e2 ++ compile e1 ++ [ADD]
end.
(** Here are a couple unit tests for our compiler: *)
Example compile_test_1 : compile (Const 42) = [PUSH 42].
Proof. trivial. Qed.
Example compile_test_2 : compile (Plus (Const 2) (Const 3))
= [PUSH 3; PUSH 2; ADD].
Proof. trivial. Qed.
(**
Those tests demonstrate that the compiler produces
some programs that do seem to correspond to the
input expression. But we haven't really tested
the postcondition of [compile]: we want to know
whether both sides of the [=] in those test cases above
above evaluate to the same value. So let's check that, too.
*)
Example post_test_1 :
eval_prog (compile (Const 42)) [] = Some [eval_expr (Const 42)].
Proof. trivial. Qed.
Example post_test_2 :
eval_prog (compile (Plus (Const 2) (Const 3))) []
= Some [eval_expr (Plus (Const 2) (Const 3))].
Proof. trivial. Qed.
(**
So far, so good.
But as we know from Dijkstra, "testing can only prove the
presence of bugs, never their absence." How could we show
that the compiler is correct for every input expression?
WE PROVE IT!
The following theorem is a _specification_ that says
what it means for [compile] to be correct. In particular,
it says these two computations produce the same result:
- Compiling [e] then evaluating the resulting program
according to the semantics of the target language, starting
with the empty stack.
- Evaluating [e] according to the semantics of the source language,
then pushing the result on the empty stack and wrapping it
with [Some].
*)
Theorem compile_correct : forall (e:expr),
eval_prog (compile e) [] = Some [eval_expr e].
Abort.
(**
Proving the theorem will require a helper lemma about the associativity
of list append.
*)
Lemma app_assoc_4 : forall (A:Type) (l1 l2 l3 l4 : list A),
l1 ++ (l2 ++ l3 ++ l4) = (l1 ++ l2 ++ l3) ++ l4.
Proof.
intros A l1 l2 l3 l4.
replace (l2 ++ l3 ++ l4) with ((l2 ++ l3) ++ l4);
rewrite app_assoc; trivial.
Qed.
(**
We'll also need a helper lemma that generalizes the main theorem.
Specifically, it says that there could be additional instructions [p]
in the program, and additional values [s] on the stack, but those
won't keep the expression [e] from being compiled and executed
correctly. The proof uses the same technique of a generalized
inductive hypothesis, where we won't introduce all the variables
right away, as we used when verifying [fact_tr] above.
*)
Lemma compile_helper : forall (e:expr) (s:stack) (p:prog),
eval_prog (compile e ++ p) s = eval_prog p (eval_expr e :: s).
Proof.
intros e.
induction e as [n | e1 IH1 e2 IH2]; simpl.
- trivial.
- intros s p. rewrite <- app_assoc_4.
rewrite IH2. rewrite IH1. simpl. trivial.
Qed.
Theorem compile_correct : forall (e:expr),
eval_prog (compile e) [] = Some [eval_expr e].
Proof.
intros e.
induction e as [n | e1 IH1 e2 IH2]; simpl.
- trivial.
- repeat rewrite compile_helper. simpl. trivial.
Qed.
End Compiler.
(**
Now we have a verified compiler, and we can extract it (and the two
tiny interpreters we also wrote) to OCaml!
*)
Extract Inlined Constant Init.Nat.add => "( + )".
Extract Inlined Constant app => "( @ )".
Extraction "compiler.ml" Compiler.eval_expr Compiler.eval_prog Compiler.compile.
(**
** Summary
We've come to fulfillment of our purposes in learning Coq:
verification. To reach this point, we had to learn
how to program in Coq's functional programming language and
how to prove in Coq's logic. Along the way we also learned
about the correspondence between proofs and programs.
Forty years ago, verification techniques worked only
for really short programs written in toy languages, and
it was all done with pen and paper. Today, research projects
are able to verify compilers and operating systems, and the
computer can check the proofs. In another forty years, who knows?
Perhaps, at the end of this unit on formal methods, you find yourself
wondering why we spent so much time on it. One reason is that
it's an important (if niche) area in programming languages and
software engineering. To be well educated in this field requires
that you know something about it. Even if you never touch formal
methods again, you now can talk from first hand experience
with other people in industry. Another reason is that the future
of functional programming (hence programming in general) is headed
toward languages with ever richer type systems, like Coq's. Coq's
type system is sophisticated enough to express not just programs
but theorems. Before the end of your career, it's a good bet
there will be some mainstream language that has rich enough
types to express correctness properties that are beyond
today's type systems, even if not rich enough to state
arbitrary propositions.
But even more importantly, the final reason we covered formal
methods was to spend some time thinking about #*what it means
for a program to be correct.*# One perspective on that issue,
a perspective well covered by other introductory programming
courses as well as this course, is testing. Unit testing
is a cost effective way to ascertain whether a program
has faults. Now, you've experienced another perspective: proof.
Proving the correctness of a program is expensive, yet it
offers guarantees beyond unit tests.
That's not at all ---no, not at all!--- to claim that formal methods
are perfect. You might end up proving the wrong theorems.
You might make assumptions that turn out to be invalid.
There might even be faults in the programs you use to check
your proofs. All of those could make your formal efforts
futile.
But at the end of the day, we programmers (we happy programmers,
we band pursuing the craft of code) are creating artifacts that
are part proof and part art and if we do our jobs right
altogether beautiful. _Beauty is our business_. Never
lose sight of that.
** Terms and concepts
- algebraic specification
- axiom
- extraction
- generalized inductive hypothesis
- inference rule
- redundancy
- reference implementation
- relation
- specification
- testing
- verification
** Tactics
- [replace]
** Further reading
- _Software Foundations, Volume 1: Logical Foundations_.
#
Chapter 12 through 15: Imp, ImpParser, ImpCEvalFun, Extraction#.
- _Interactive Theorem Proving and Program Development_.
Chapters 9 through 11. Available
#
online from the Cornell library#.
- Notes by Robert McCloskey on #
algebraic specification#.
- The verified compiler section of the notes above is inspired by Adam
Chlipala's book #
*Certified Programming with Dependent Types*#.
*)