Logic for Program Verification


Propositional Logic

Introduction

Can we prove that a program works for all possible inputs. In principle, yes. In practice, this approach is too time-consuming to be applied to large programs. However, it is useful to look at how proofs of correctness can be constructed:

What is a proof? A completely convincing argument that something is true. For an argument to be completely convincing, it should be made up of small steps, each of which is obviously true. In fact, each step should be so simple and obvious that we could build a computer program to check the proof. Two ingredients are required:

  1. A language for clearly expressing what we want to prove.
  2. Rules for building up an argument in steps that are obviously correct.

A logic accomplishes these two goals.

The strategy for proving programs correct will be to convert programs and their specifications into a purely logical statement that is either true or false. If the statement is true, then the program is correct. But for our proofs to be truly convincing, we need a clear understanding of what a proof is.

Curiously, mathematicians did not really study the proofs that they were constructing until the 20th century. Once they did, they discovered that logic itself was a deep topic with many implications for the rest of mathematics.

Propositions

We start with propositional logic, which is a logic built up from simple symbols representing propositions about some world. For our example, we will use the letters A, B, C, ... as propositional symbols. For example, these symbols might stand for various propositions:

It is not the job of propositional logic to assign meanings to these symbols. However, we use statements to the meanings of D and E to talk about the correctness of programs.

Syntax of Propositions

We define a grammar for propositions built up from these symbols. We use the letters P, Q, R to represent propositions (or formulas):

P,Q,R ::= ⊤                    (* true *)
        | ⊥                    (* false *)
        | A, B, C              (* propositional symbols *)
        | ¬P                   (* sugar for P⇒⊥ *)
        | P∧Q                  (* "P and Q" (conjunction) *)
        | P∨Q                  (* "P or Q" (disjunction) *)
        | P⇒Q                  (* "P implies Q" (implication) *)

Note: On some browsers, on some operating systems, in some fonts, the symbol for conjunction (and) is rendered incorrectly as a small circle. It should look like an upside-down V. In this document, it will appear variously as , ∧, or ∧.

The precedence of these forms decreases as we go down the list, so P∧Q⇒R is the same as (P∧Q)⇒R. One thing to watch out for is that ⇒ is right-associative (like →), so P⇒Q⇒R is the same as P⇒(Q⇒R). We will introduce parentheses as needed for clarity. We will use the notation for logical negation, but it is really just syntactic sugar for the implication P⇒⊥. We also write P⇔Q as syntactic sugar for (P⇒Q)∧(Q⇒P), meaning that P and Q are logically equivalent.

This grammar defines the language of propositions. With suitable propositional symbols, we can express various interesting statements, for example:

A∧B⇒C
"If I got a 90% on the final and I attended class, I will get an A"
¬C⇒(¬A∨¬B)
"If I didn't get an A in the class, then either I didn't get a 90% on the final or I didn't attend class"
C∨¬A∨¬B
"Either I got an A in the class, or I didn't get a 90% on the final or I didn't attend class"

In fact, all three of these propositions are logically equivalent, which we can determine without knowing about what finals and attendance mean.

Semantics of propositions

In order to say whether a proposition is true or not, we need to understand what it means. The truth of a proposition sometimes depends on the state of the "world". For example, proposition D above is true in a world where x=0 and y=10, but not in a world in which x=y=0. To understand the meaning of a proposition P, we need to know whether for each world, it is true. To do this, we only need to know whether P is true for each possible combination of truth or falsity of the propositional symbols A,B,C,... within it. . For example, consider the proposition A∧B. This is true when both A and B are true, but otherwise false. We can draw a truth table that describes all four possible worlds compactly:

falsetrue A
falsefalsefalse
truefalsetrue
B   

This kind of table can also be used to describe the action of an operator like ∧ for a conjunction over general propositions P∧Q rather than over simple propositional symbols A and B. Here is a truth table for disjunction. Notice that in the case where both P and Q are true, we consider P∨Q to be true. The connective ∨ is inclusive rather than exclusive.

falsetrue P
falsefalsetrue
truetruetrue
Q   

We can also create a truth table for negation ¬P:

¬falsetrue P
falsetruefalse   

Implication P⇒Q is tricky. The implication seems true if P is true and Q is true, and if P is false and Q is false. And the implication is clearly false if P is true and Q is false:

falsetrue P
falsetruefalse
true?true
Q   

What about the case in which P is false and Q is true? In a sense we have no evidence about the implication as long as P is false. Logicians consider that in this case the assertion P⇒Q is true. Indeed, the proposition P⇒Q is considered vacuously true in the case where P is false, yielding this truth table:
falsetrue P
falsetruefalse
truetruetrue
Q   

We can use truth tables like these to evaluate the truth of any propositions we want. For example, the truth table for (A⇒B)∧(B⇒A) is true in the places where both implications would be:
falsetrue A
falsetruefalse
truefalsetrue
B   

In fact, this means that A and B are logically equivalent, which we write as A iff B or A⇔B. If P⇔Q, then we can replace P with Q wherever it appears in a proposition, and vice versa, without changing the meaning of the proposition. This is very handy.

Another interesting case is the proposition (A∧B)⇒B. The truth table looks like this:

(A∧B)⇒Bfalsetrue A
falsetruetrue
truetruetrue
B   

In other words, the proposition (A∧B)⇒B is true regardless of what A and B stand for. In fact, it will be true if A and B are replaced with any propositions P and Q. We call such a proposition a tautology.

Tautologies

There are a number of useful tautologies, including the following:
Associativity (P∧Q)∧R ⇔ (P∧Q)∧R (P∨Q)∨R ⇔ (P∨Q)∨R
Symmetry (P∧Q) ⇔ (Q∧P) (P∨Q) ⇔ (Q∨P)
Distributivity P∧(Q∨R) ⇔ (P∧Q)∨(P∧R) P∨(Q∧R) ⇔ (P∨Q)∧(P∨R)
Idempotency P∧P ⇔ P P∨P ⇔ P
DeMorgan's laws ¬(P∧Q) ⇔ ¬P∨¬Q ¬(P∨Q) ⇔ ¬P∧¬Q
Negation P ⇔ ¬¬P P⇒⊥ ⇔ ¬P P⇒Q ⇔ ¬P∨Q

These can all be derived from the rules we will see shortly, but they are useful to know.

Notice that we can use DeMorgan's laws to turn ∧ into ∨, and use the equivalence P⇒Q ⇔ ¬P∨Q to turn ∨ into ⇒, and the equivalence P⇒⊥ ⇔ ¬P to get rid of negation. So we can express any proposition using just implication ⇒ and the false symbol ⊥!

Inference rules

To prove whether propositions are true without testing every possible world, we need a system for deduction. We can construct proofs of a proposition by using inference rules to derive the desired proposition starting from axioms.

The most famous inference rule is known as modus ponens. Intuitively, this says that if we know P is true, and we know P⇒Q, then we can conclude Q:

P P⇒Q
Q
(modus ponens)

The propositions above the line are the premises; the proposition below the line is the conclusion. Both the premises and the conclusion may contain metavariables (in this case, P and Q), representing arbitrary propositions. When a inference rule is used as part of a proof, the metavariables are replaced in a consistent way with the appropriate kind of object (in this case, propositions).

Most rules come in one of two flavors: introduction or elimination rules. Introduction rules introduce the use of a logical operator, and elimination rules eliminate it. Modus ponens is an elimination rule for ⇒. On the right-hand side of a rule, we often write the name of the rule. This is helpful when reading proofs. In this case, we have written “modus ponens”, but it would be shorter to write (⇒E), meaning that this is the elimination rule for ⇒.

Conjunction (∧) has an introduction rule and two elimination rules:

P Q
P∧Q
(∧I)
P∧Q
P
(∧E1)
P∧Q
Q
(∧E2)

The simplest introduction rule is the one for T. Because it has no premises, this rule is an axiom: something that can start a proof:

 
T
(true)

Natural deduction

Together, a set of inference rules make up a proof system that determines what can be proved. There is still something important missing from our proof system: how can we prove an implication P⇒Q? Intuitively, the way this is proved is by assuming that P is true, and with that assumption, showing that Q is true too. In a proof, we are always allowed to introduce a new assumption P, which we do with the following rule. The name x is the name of the assumption. Each distinct assumption should have a different name.
 
[x : P]
(A)

Because it has no premises, this rule is an axiom: something that can start a proof. It can be used as if it proved the proposition P. It also gives a name to the assumption, which is important for making sure that the things proved end up being conditioned on the assumption.

We can introduce an implication P⇒Q by discharging a prior assumption [x: P]. We write x in the rule name to show which assumption is discharged:

[x: P]

Q
P⇒Q
(⇒I/x)
P P⇒Q
Q
(⇒E, modus ponens)

A proof is valid only if every assumption is discharged somewhere below all places the assumption appears. The same assumption can appear more than once.

The introduction and elimination rules for disjunction are as follows:

P
P∨Q
(∨I1)
Q
P∨Q
(∨I2)
P∨Q P⇒R Q⇒R
R
(∨E)

Finally, there are rules relating to negation:

[x: P]


¬P
(special case of ⇒I)
[x: ¬P]


P
(reductio ad absurdum, RAA)
P
(false proves everything)

Reductio ad absurdum is an interesting rule. It says that if the negation of a proposition can be used to prove falsity, the proposition must be true. This rule is present in classical logic but not in constructive logics, in which things cannot be proved true simply by showing the falsity of their negations. In constructive logics, the the law of the excluded middle, P∨¬P, does not hold. However, we will use this rule.

Proofs

A proof of proposition P in natural deduction starts from axioms and derives the judgement ⊢P; that is, it proves P under no assumptions. Every step in the proof is an instance of an inference rule with metavariables substituted consistently with expressions of the appropriate syntactic class.

Example 1

For example, here is a proof of the proposition (A⇒B⇒C) ⇒ (A∧B⇒C). For brevity, Γ₀ is used as an abbreviation for the assumptions (A⇒B⇒C), A∧B.

The final step in the proof is to derive (A⇒B⇒C) ⇒ (A∧B⇒C) from (A∧B⇒C), which is done using the rule (⇒I), discharging the assumption [x:(A⇒B⇒C)]. To see how this rule generates the proof step, substitute for the metavariables P, Q, x as follows: P = (A⇒B⇒C), Q = (A∧B⇒C) x=x. The immediately previous step uses the same rule, but with a different substitution: P = A∧B, Q = C, x = y.

This shows that a proof of proposition P is a tree generated by inference rules. The leaves of the tree are axioms and the root is P. If we represent each proposition in this proof as a node in a tree, we get the following:

Writing out proofs at this level of detail can be a bit tedious. But notice that checking the proof is completely mechanical, requiring no intelligence or insight whatever. Therefore it is a very strong argument that the thing proved is in fact true.

We can also make writing proofs like this less tedious by adding more rules that provide reasoning shortcuts. These rules are sound (they only prove true things), if there is a way to convert a proof using them into a proof using the original rules. Such added rules are called admissible.


Predicate Logic

Syntax

In propositional logic, the statements we are proving are completely abstract. To be able to prove programs correct, we need a logic that can talk about the things that programs compute on: integers, strings, tuples, datatype constructors, and functions. We'll enrich propositional logic with the ability to talk about these things, obtaining a version of predicate logic.

The syntax extends propositional logic with a few new expressions, shown in blue:

P,Q,R ::= ⊤             (* true *)
        | ⊥             (* false *)
        | A, B, C       (* propositional symbols *)
        | ¬P            (* sugar for P⇒⊥ *)
        | P∧Q           (* "P and Q" (conjunction) *)
        | P∨Q           (* "P or Q" (disjunction) *)
        | P⇒Q           (* "P implies Q" (implication) *)
        | ∀x.P          (* P is true for all x. P can mention x*)
        | ∃x.P          (* There exists some x such that P is true *)
        | e1 = e2       (* e1 is equal to e2 *)
        | p(e)          (* The predicate named p is true for e *)

e ::=     v             (* integers, tuples, other SML values *)
        | x             (* The name of a value, may be bound 
                            by ∀ or ∃ *)
        | f(e)          (* Result of applying function
                           named f to e *)

The formula ∀x.P means that the formula P is true for any choice of x. This is called universal quantification, and ∀ is the universal quantifier. The formula ∃x.P denotes existential quantification. It means that the formula P is true for some choice of x, though there may be more than one such x. Existential and universal quantifiers can be turned into each other using negation. The formula ∃x.P is equivalent to ¬∀x.¬P, because if there exists some x that makes P true, then clearly ¬P is not true for all x. Similarly, the formula ∀x.P is equivalent to ¬∃x.¬P. These equivalences are generalizations of DeMorgan's to existential and universal quantifiers.

We will assume that we can tell from the name of x what type of thing it is. What if we want to restrict to talking about some subset? For universal quantifiers, the trick is to use an a For example, if we want to say that all numbers greater than one are also greater than zero, we write ∀x.x>1⇒x>0. This works because the quantified formula is vacuously true for the numbers not greater than 1. If we want to limit the range in an existential, we use conjunction. For example, to say that there exists a number greater than zero that is greater than one, we write ∃x.x>0∧x>1.

Using quantifiers, we can express some interesting statements. For example, we can express the idea that a number n is prime (note m and k are integers) in various logically equivalent ways:

prime(n)∀m.1<m∧m<n⇒¬∃k.k*m = n
 ¬∃m.1<m∧m<n∧∃k.k*m = n (DeMorgan's laws)
¬∃m.∃k.1<m∧m<n∧k*m = n

This example shows one fine point of syntax: when reading quantifiers ∀x.P, the formula P extends as far to the right as possible. So (∀m.1<m∧m<n⇒¬∃k.k*m = n) is read as (∀m.1<m∧m<n⇒(¬∃k.k*m = n)) rather than as (∀m.1<m∧m<n)⇒(¬∃k.k*m = n). This is the same as for other perhaps more familiar binding constructs, such as summation ∑ and integrals ∫.

Rules for quantifiers

Introduction and elimination rules can be defined for universal and existential quantifiers. The rules for universals are as follows:

P x does not appear free in any axiom or undischarged assumption
∀x.P
(∀I)
∀x.P
P{e/x}
(∀E)

The requirement in (∀I) that x does not appear in undischarged assumptions prevents us from doing unsound generalizations like the following:

[x>1]
∀x. x>1
x>1 ⇒ ∀x. x>1

It is, however, fine to have x appear in an assumption that is discharged above the point where the (∀I) rule is used, e.g.:

[x>1]
x>0
x>1 ⇒ x>0
∀x. x>1 ⇒ x>0

The rule (∀E) specializes the formula P to a particular value of x. We require implicitly that e be of the right type to be substituted for x. Since P holds for all x, it should hold for any given choice of x, that is, e.

The rules for existentials are as follows:

P{e/x}
∃x.P
(∃I)
∃x.P [P]

Q
x does not appear free in Q or in the assumptions or axioms of the proof of Q (other than in P).
Q
(∃E)

The rule (∃I) derives ∃x.P because a witness e to the existential has been produced. The idea behind rule (∃E) is that if something (Q) can be shown without using any information about the witness x, other than what is known from P, then Q is true without existential quantification: ∃x.Q is the same as Q if Q doesn't mention x.

Reasoning with equality

The predicate logic allows the use of arbitary predicates p. Equality is a predicate that applies to two arguments; we can read e1=e2 alternatively as a predicate =(e1,e2). We support reasoning about predicates by adding rules (esp. axioms) Equality is special because when two things are equal, one can be substituted for the other in any context.

The following three rules capture that equality is an equivalence relation: it is reflexive, symmetric, and transitive.

 
e=e
(refl)
e1=e2
e2=e1
(symm)
e1=e2 e2=e3
e1=e3
(trans)

Beyond being an equivalence relation, equality preserves meaning under substitution. If two things are equal, substituting one for the other in equal terms results in equal terms. This is known as Leibniz's rule:

e1 = e2
e{e1/x} = e{e2/x}

Leibniz's rule can also be applied to show propositions are logically equivalent :
e = e'
P{e/z} ⇔ P{e'/z}
For example, suppose we know y=x+1 and x(x+1)+(x+1) = (x+1)2. Then we can use this rule to prove xy+(x+1) = y2, by applying this rule with e=x+1, e'=y, and P = (xz+(x+1) = z2).

The same idea can be applied completely at the propositional level as well. If we can prove that two formulas are equivalent, they can be substituted for one another within any other formula.

Q ⇔ R
P{Q/A} ⇔ P{R/A}

This admissible rule can be very convenient for writing proofs, though anything we can prove with it can be proved using just the basic rules. It can be very handy when there is a large “library” of logical equivalences to draw upon, because it allows rewriting of deeply nested subformulas.

Reasoning on integers and other sets

For reasoning about specific kinds of values, we need axioms that describe how those values behave. For example, the following axioms partly describe the integers and can be used to prove many facts about integers. In fact, they define a more general structure, a commutative ring, so anything proved with them holds for any commutative ring.
∀x.∀y.x+y=y+x (commutativity of +)
∀x.∀y.∀z.(x+y)+z = x+(y+z) (associativity of +)
∀x.∀y.∀z.(x*y)*z = x*(y*z) (associativity of *)
∀x.∀y.∀z.x*(y+z) = x*y+x*z (+ distributes over *)
∀x.x + 0 = x (additive identity)
∀x.x + (-x) = 0 (additive inverse)
∀x.x*1=x ∧ 1*x=x(multiplicative identity)
¬0=1
∀x.∀y.x*y=y*x (commutativity of *)

These rules use a number of functions: +, *, -, 0, and 1 (we can think of 0 and 1 as functions that take zero arguments). These symbols are represented by the metavariable f in the grammar earlier.

Proving facts about arithmetic can be tedious. For our purposes, we will write proofs that do reasonable algebraic manipulations as a single step, e.g.:
(x+2)2 = 2*x
x2 = −2*x−2
(algebra)

This proof step can be done explicitly using the rules and axioms above, but it takes several steps.


Hoare logic

Partial correctness assertions

So far we have gained the ability to state interesting logical propositions, but we lack the ability to say anything directly about code. We introduce partial correctness assertions (PCA's) for this purpose. A partial correctness assertion has the form

{P} e ⇓ x {Q}

with the following interpretation: if e is evaluated in an environment (i.e., variable bindings) such that P holds, and it evaluates to a value v, then when x=v, Q holds. In other words, we use x as a name for the value that is produced by evaluating e.

In an assertion {P}e⇓x{Q}, the proposition P is the precondition and Q is the postcondition. Therefore we can use PCA's to express the correctness of a function. Suppose we have a function declaration with a spec:

(* Requires: P
   Returns: x where Q
 *)
let f(y) = e

Then the correctness of the function implementation is expressed simply as {P}e⇓x{Q}, assuming that P uses the name y to talk about the argument of f.

Examples

Here are two examples of (true) PCA's:

{y≥0} y+1 ⇓ r {r≥1}
{T} if x < y then y else x ⇓ r {r≥x ∧ r≥y}

Inference rules for Hoare logic

We have started using some program terms in the logic. But we have to be careful not to use terms that involve real computation. We will use the metavariable a to refer to terms that can appear in the logic: variables, values, uses of constructors and certain simple functions (e.g., +).

Suppose we evaluate a term a to a value named r. When can we prove a PCA of the form {P} a⇓r {Q} ? We know that r is equal to the value of a. Therefore, before a is evaluated, Q must be true when the occurrences of r are replaced with a. This gives us the following rule:

P ⇒ Q{a/r}
{P} a ⇓ r {Q}
(subst)

Notice that the top of this rule is now a formula in predicate logic rather than a PCA. To prove correctness of the program, we'd need to prove this formula, too, using the inference rules for predicate logic.

In general we may want to prove correctness of a program in which the precondition is unnecessarily strong or the postcondition is unnecessarily weak. We can use the rule of consequence to make the precondition and postcondition easier to work with, by weakening the precondition or strengthening the postcondition.

P ⇒ P'{P'} e ⇓ r {Q'} Q' ⇒ Q
{P} e ⇓ r {Q}
(consequence)

Example

Using the rules we've seen so far, we can prove the first of the above examples correct. Notice that almost all of the proof is predicate logic, not Hoare logic. Only the last step is Hoare logic.

 
[x: y ≥0]
y+1 ≥1
y≥0 ⇒ y+1 ≥1
{y≥0} y+1 ⇓ r {r≥1}
(A)
(algebra)
(⇒I/x)
(subst)
 

Now let's consider some more interesting OCaml terms. Suppose we have a general let term: let x = e1 in e2. This satisfies a postcondition Q if the evaluation of e2 satisfies the postcondition Q. But what about occurrences of x in e2? These don't make sense in Q since x is not in scope outside the let. And we can't substitute e1 for x, because in general e1 is not a term that can appear in the logic. Therefore we introduce another formula R that provides enough information about x to allow Q to be proved:

{P} e1 ⇓ x {R} {R} e2 ⇓ r {Q} x not free in P
{P} let x = e1 in e2 ⇓ r {Q}
(let)

This may prompt us to wonder where to get R from. Typically the way to produce R is to work backward from Q, picking the weakest precondition R that allows us to prove {R} e2 ⇓ r {Q}.

Now let's consider if a then e1 else e2. Both e1 and e2 must satisfy the postcondition. However, there is additional information available for each branch: the outcome of the expression a:

{P∧a} e1 ⇓ r {Q} {P∧¬a} e2 ⇓ r {Q}
{P} if a then e1 else e2 ⇓ r {Q}
(if)

What if we have an if expression with complex expression appearing as the conditional, rather than a simple a? We can always rewrite the code to compute the complex expression in a let, bind it to a variable, and then use that variable as the if condition. That is, if e0 then e1 else e2 is the same thing as let x=e0 in if x then e1 else e2, assuming that x is a fresh variable not used in e1 or e2.

The rule for a match expression is similar to the rule for if. Given a pattern p, we write ∃vars(p). to mean ∃x1. ... ∃xn. where x1,...xn are the variables bound in the pattern. Then, we have the following rule:

{P∧a=p1} e1 ⇓ r {Q}
vars(p) not free in P
{P∧¬∃vars(p1).a=p1} match a with p2→e2 | ... | pn → en ⇓ r {Q}
{P} match a with p1→e1 | ... | pn → en ⇓ r {Q}
(match)

If there is only a single match clause, we require it to succeed!

P ⇒ ∃vars(p).a=p vars(p) not free in P
{P∧vars(p)=a} e ⇓ r {Q}
{P} match a with p→e ⇓ r {Q}
(match1)

Functions

How do we verify code that uses other functions? As before, we will assume that other functions are equipped with specifications, and verify code under the assumption that these functions correctly implement their specifications.

Suppose that a function f has a specification like the following, where P and Q are logical formulas mentioning the formal argument x and the result r:

(* Returns: f(x) = r where Q
 * Requires: P
 *)

Given a such as function f in scope, we will arrange the logic so that there are predicates f.pre and f.post in scope, such that f.pre(x) captures f's precondition P when applied to some argument x, and f.post(x,r) captures f's postcondition Q when applied to an argument x and result r. If f.pre and f.post are defined, we can check a partial correctness assertion involving a function application:

P ⇒ f.pre(a) P ∧ f.post(a,r) ⇒ Q
{P} f(a) ⇓ r {Q}
(app)

In other words, to apply a function f, the precondition at the application site must imply the precondition of the function, and the postcondition of the function must imply the postcondition of the application.

When showing that the postcondition Q holds, we get to assume not only the postcondition of the applied function, but also the precondition P. This works because we are only writing pure functional programs. In an imperative setting there would be the possibility that f had a side effect that made P false. Side effects make it trickier to reason about imperative code.

The premises of the (app) rule are predicate logic rather than Hoare logic. We don't have to prove anything about f itself. That takes place elsewhere, when we check a function definition. Suppose that we see a function definition such as f(x) = e or fun x->e, with a declared precondition P and postcondition Q. The predicates f.pre and f.post are defined by the following formula, which we will abbreviate as SPEC(f):

(∀x. f.pre(x) ⇔ P) ∧ (∀x.∀r. f.post(x,r) ⇔ Q)

To check correctness of the function so that we can assume this spec in other code, it is sufficient to prove {P}e⇓r{Q}. In general we may know more than the precondition P; we may also know facts P' that are true at the point of the function declaration. Therefore the function is correct if {P∧P'}e⇓r{Q}. If the function is recursive, we also may its own spec holds when checking its body. This reasoning leads to a rule for checking a function declaration:

{P'∧SPEC(f)} e'⇓ r' {Q'} {P'∧f.pre(x)} e ⇓ r {f.post(x,r)} f fresh in P', P, Q; x,r fresh in P'
{P'} let f x = e in e' ⇓ r' {Q'}
(fundecl)

If the function had been declared as recursive, then we could also use the spec of f in the precondition for e. There are some freshness requirements on the names f, x, and r; if there is a name collision, the freshness requirements can always be satisfied by renaming f, x, and r consistently within the let.

This function squares a number:

(* Returns: f(n) = r where r = n2
 * Requires: n≥0
 *)
let rec sq n =
    if n=0
      then 0
      else let y = sq(n-1) in
	     y + n + n - 1

Let's just prove the correctness of the function. Here, SPEC(sq), which we'll just write as SPEC, is:

(∀n. sq.pre(n) ⇔ (n≥0)) ∧ (∀n.∀r. sq.post(n,r) ⇔ r = n2)

Correctness is proving the following PCA. Note that we include the spec of sq in the precondition because sq is recursive.

{ SPEC ∧ n≥0 } if n = 0 then ... ⇓ r { r = n2 }

If we apply the Hoare rules in a completely mechanical way, we get most of a proof:

SPEC∧n=0 ⇒ 0 = n2
{SPEC∧n=0} 0⇓r {r=n2}
{SPEC∧n>0} sq(n-1)⇓y {y=(n-1)2}
y=(n-1)2 ⇒ y+n+n-1 = n2
{y=(n-1)2} y+n+n-1 {r=n2}
{SPEC∧n>0} let y=sq(n-1) in y+n+n-1 {r=n2}
{ SPEC ∧ n≥0 } if n = 0 then ... ⇓ r { r = n2 }

To prove the bold PCA, we need to use the (app) rule just introduced. This missing part of the proof is straightforward to construct; here is it in more detail than we'd ordinarily bother with:

[SPEC∧n>0]
n-1 ≥ 0
[SPEC∧n>0]
sq.pre(n-1) ⇔ (n-1) ≥ 0
sq.pre(n-1)
SPEC∧n>0 ⇒ sq.pre(n-1)
[...∧sq.post(n-1,y)]
sq.post(n-1,y) sq.post(n-1,y) ⇔ y = (n-1)2
y = (n-1)2
SPEC∧n>0∧sq.post(n-1,y) ⇒ y = (n-1)2
{SPEC∧n>0} sq(n-1)⇓y {y=(n-1)2}

Data abstraction

For data abstractions, the spec talks about abstract values but the implementation talks about concrete ones. To prove an ADT operation correct, we apply f.pre and f.post to AF(x) and AF(r) as appropriate, to map the concrete values to a domain where the specification makes sense. Furthermore, when proving the function body correct, we can assume the rep invariant holds on the argument -- RI(x) -- but must show that it holds on the result -- RI(r).

Here's an example to think about: an implementation of booleans.

type boole
val true_: boole
val false_: boole
val if_: boole -> 'a -> 'a -> 'a
val and_: boole -> boole -> boole
val or_: boole -> boole -> boole
val not_: boole -> boole
type boole = int
(* AF(x) = true if x > 0
 * AF(x) = false if x ≤ 0
 * RI(x) ⇔ -10 ≤ x ∧ x ≤ 10
 *)
val true = 3
val false = -4
let if_ x a b = if x > 0 then a else b
let and_ x y = if x > y then y else x
let or_ x y = if x > y then x else y
let not_ x = 1 - x

Suppose we want to prove and_ and not_ correct. Their specs are:

(* and_ x y  is r where r ⇔ x∧y *)
(* not_ x  is r where r ⇔ ¬x *)

Therefore we need to show that and_.post(AF(x), AF(y), AF(r)) holds after evaluating an application, which is AF(r) ⇔ AF(x) ∧ AF(y).

{RI(x) ∧ RI(y)} if x > y then y else x ⇓ r {RI(r) ∧ AF(r) ⇔ AF(x) ∧ AF(y)}

And similarly for not_, we need to show not_.post(AF(x), AF(r)), which is AF(r) ⇔ ¬AF(x):

{RI(x)} 1−x ⇓ r {RI(r) ∧ AF(r) ⇔ ¬AF(x)}

The proof for not_ will not go through, in fact. By the Hoare rule, we have to show RI(x) ⇒ RI(1−x) ∧ (AF(1−x) ⇔ ¬AF(x)). But RI(x) does not imply RI(1−x); consider the case where x = -10. The failure of the proof shows that our rep invariant is wrong; a workable RI would be RI(x) ⇔ -9 ≤ x ≤ 10, since -9 ≤ x ≤ 10 ⇒ -9 ≤ 1−x ≤ 10.


Verification conditions

Our goal is to prove programs correct. Another strategy is to turn a program into a logical formula whose truth corresponds to the correctness of the program. Given this triple of (precondition, program, postcondition), we produce a logical formula called the verification condition, which implies that that starting with the precondition true and executing the program, the postcondition will be true. Then we can show that the program is correct by producing a proof that the formula is true.

To produce this formula, we will use the program code and the desired postcondition to generate a precondition that ensures the postcondition. If the actual precondition implies this precondition, the program must be correct. We define a function vc(e, r, P) which produces a logical formula with the following meaning (“vc” for verification condition):

vc(e, r, P) is a formula Q such that if e is evaluated when the precondition Q is true, and its result is placed in the variable r, the postcondition P will be true.

Given this definition, given code e, precondition PRE, and postcondition POST, the actual verification condition for the program is:

PRE ⇒ vc(e, r, POST)

We can depict this approach to program verification as follows:

For example, consider the very simple program e = y+1. If we give the name r to the result of evaluating e, and want to ensure the postcondition r>10, what is the weakest precondition that will ensure this? Clearly, if y+1>10 (or equivalently, y>9). Therefore, we'd like to define that vc(y+1, r, r>10) is (y+1 > 10). Then, if for example we have a precondition that y ≥ 15, we can reduce the correctness of the program to the truth of the formula y ≥15 ⇒ y+1 > 10.

Generating preconditions

Let us use the metavariable “a” to represent an SML expression that can also appear as an expression in the logic. Expressions of this form can use functions and predicates that are useful for writing specifications; we will also let them use SML constructors such as tuple constructors and datatype constructors. For example, the expression y+1 is such an expression. The vc rule for an expression a is very simple:

vc(a, r, P) =

P{a/r}

In other words, we substitute the expression a for the variable r. In fact, we already used this rule for the previous example, where we decided the weakest precondition is (r>10){y+1/r} = (y+1>10).

Next, consider the expression let x = e1 in e2 end. The result of this expression is whatever e2 evaluates to, so intuitively we need to satisfy vc(e2, r, P). However, e2 may depend on x, so the real precondition is that e1 should evaluate to something named x such that the evaluation of e2 satisfies P:

vc(let val x = e1 in e2 end, r, P) =

vc(e1, x, vc(e2, r, P))

Example:

vc(let val y=z+1 in y*y end, r, r>0)
= vc(z+1, y, vc(y*y, r, r>0))
= vc(z+1, y, y*y>0)
= (z+1)*(z+1)>0
z≠−1

Conditionals

For an if expression, there are two possible execution paths, and the postcondition must be satisfied in both paths. Therefore, we use a conjunction over preconditions for these two cases:

vc(if a then e1 else e2, r, P) =

(a ⇒ vc(e1, r, P)) ∧ (¬a ⇒ vc(e2, r, P))

Example:

vc(if y>z then y else z, r, r>0)
=(y>z ⇒ vc(y, r, r>0)) ∧ (y≤z ⇒ vc(z, r, r>0))
=(y>z ⇒ y>0) ∧ (y≤z ⇒ z>0)
(y≤z ∨ y>0) ∧ (y>z ∨ z>0) (De Morgan)
y>0 ∨ z>0

Notice that we assumed that the condition was a simple expression a that could appear in the logic. We can always desugar more complex if's and determine the vc accordingly:
vc(if e0 then e1 else e2, r, P)
=vc(let val x=e0 in if x then e1 else e2 end, r, P)(where x is a fresh variable)
=vc(e0, x, vc(if x then e1 else e2, r, P))
=vc(e0, x, x⇒vc(e1, r, P)∧(¬x)⇒vc(e2, r, P))

Therefore subsequent rules will often assume that subexpressions have simple form.

Datatypes and case

One other important expression is case, which is like if in that there are multiple alternative expressions to evaluate. We write two rules, one for a case with multiple arms, and another for a case with a single arm. In both rules we assume there there is a set of variables xi that are bound in the pattern p

vc(case a of p => e | ... , r, P) =
   (a = p ⇒ vc(e, r, P))
 ∧ ((¬∃xi.a=p) ⇒ vc(case a of ..., r, P))

vc(case a of p => e, r, P) =
   (a = p ⇒ vc(e1, r, P))  ∧ (∃xi.a=p)

To reason about datatypes, we need some axioms. Suppose we have a datatype declaration:

datatype t = X1 of t1 | ... | Xn of tn

Then we have axioms saying that any value x of type t matches exactly one of these constructors, with corresponding argument yi of type ti:

⊢ ∀x. (∃y1.x=X1(y1))∨...∨ (∃yn.x=Xn(yn))

And all of these constructors are disjoint: for every i and j, 1≤i≤n, 1≤j≤n,

⊢ ∀x. (∃yi.x=Xi(yi)) ⇒ (¬∃yj.x=Xn(yj))

Functions

We verify functions in a modular style, using function specifications. For each function, we prove that it satisfies its spec. We then assume that the function satisfies its specification when reasoning about code that uses the function. Let's assume that we have specifications written in the following style:

(* Requires: Q
 * Returns: r = f(x) where P
 *)
fun f(x) = e

The Requires clauses is the precondition, and talks about the formal parameter x. In general the function may mention some other formal parameter name; we will use the notation arg(f) to represent the declared formal parameter. The Returns clause gives the postcondition P. We assume that it expresses conditions on the result of the function by naming it as some variable result(f), which is r in this case. We write pre(f) to refer to the precondition Q, and post(f) to refer to the postcondition P.

Then we can capture the correctness of the implementation of f in the following formula:

pre(f) ⇒ vc(e, result(f), post(f))

When a function f is applied to an argument a, we assume that f is implemented correctly. In other worse, the verification precondition should be that the precondition of the function is satisfied, and that the postcondition of the function implies the desired postcondition. However, the uses of arg(f) and result(f) in the precondition and postcondition need to be replaced by the actual argument a and the desired result variable r:

vc(f(a), r, P) =

pre(f){a/arg(f)} ∧ (post(f){a/arg(f), r/result(f)} ⇒ P)

Let's use this to prove that a function that computes the maximum of two numbers is correct:

(* Requires: true
 * Returns: r where (r=x ∨ r=y) ∧ r≥x ∧ r≥y
 *)
fun max(x:int, y:int) =
   if x > y then x else y

We want to show pre(max) ⇒ vc(if x > y then x else y, result(max), post(max)), which is:

pre(max) ⇒ vc(if x > y then x else y, result(max), post(max))
= true ⇒ vc(if x > y then x else y, result(max), post(max))
= vc(if x > y then x else y, result(max), post(max))(Note: true⇒P ⇔ P)
= (x>y ⇒ vc(x, r, post(max))) ∧ (x≤y ⇒ vc(y, r, post(max)))
= (x>y ⇒ post(max){x/r}) ∧ (x≤y ⇒ post(max){y/r})
= (x>y ⇒ (x=x ∨ x=y) ∧ x≥x ∧ x≥y) ∧ (x≤y ⇒ (y=x ∨ y=y) ∧ y≥x ∧ y≥y)
(x>y ⇒ x≥y) ∧ (x≤y ⇒ y≥x)
(T∧T)
T

Therefore the implementation is correct.

Example

Now, suppose we want to show that the following function that finds the maximum element in a list is correct:

(* Requires: l ≠ nil
 * Returns: the maximum element of the list. That is, r where
            r∈l ∧ ∀x. x∈l ⇒ r≥x
 *)
fun lmax(l) =
  case l of
    [y] => y
  | h::t => let val m = lmax(t) in
	      max(m,h)
	    end

We need to prove l≠nil ⇒ vc(case l of ..., r, post(lmax)). We also need to define the predicate ∈ for membership in a list for this spec to make sense: x∈l ⇔ ∃h.∃t.l=h::t ∧ (h=x ∨ x∈t). (Technically, this predicate is defined recursively, by induction on the length of the list l.) Expanding out the definition of vc for the function body and performing some logical simplifications, we have:
l≠nil ⇒ vc(case l of ..., r, post(lmax))
=   l≠nil ⇒ (l = x::nil ⇒ vc(y, r, post(lmax))) ∧
(¬∃y. l=y::nil) ⇒ vc(case l of h::t => let ..., r, post(lmax))
= l≠nil ⇒ (l = y::nil ⇒ post(lmax){y/r})) ∧
(¬∃y. l=y::nil) ⇒ vc(case l of h::t => let ..., r, post(lmax))
= l≠nil ⇒ (l = y::nil ⇒ y∈l ∧ ∀x. x∈l ⇒ y≥x)(¬∃y. l=y::nil) ⇒ vc(case l of h::t => let ..., r, post(lmax))

To prove this, we can separately prove the conjuncts inside the consequent, starting with the first (blue), which captures the correctness of the first arm of the case. Here are the first steps of that proof:

l≠nil, l = y::nil ⊢ y∈l
l≠nil, l = y::nil, x∈l ⊢ y≥x
l≠nil, l = y::nil ⊢ ∀x. x∈l ⇒ y≥x
l≠nil, l = y::nil ⊢ y∈l ∧ ∀x. x∈l ⇒ y≥x
l≠nil ⇒ l = y::nil ⇒ (y∈l ∧ ∀x. x∈l ⇒ y≥x)

Now let's look at the second conjunct, corresponding to the second arm:

l≠nil ⇒ ¬∃y. l=y::nil ⇒ vc(case l of h::t => let ..., r, post(lmax))
=    l≠nil ⇒ ¬∃y. l=y::nil ⇒ vc(case l of h::t => let ..., r, post(lmax))
=    l≠nil ⇒ ¬∃y. l=y::nil ⇒ ((l = h::t ⇒ vc(let ..., r, post(lmax)))∧∃h.∃t.l=h::t)
=    (l≠nil ∧ ¬∃y. l=y::nil) ⇒ ((l=h::t ⇒ vc(let val m = lmax(t) in max(m,h) end, r, post(lmax)))∧∃h.∃t.l=h::t)
= (l≠nil ∧ ¬∃y. l=y::nil) ⇒ ((l=h::t ⇒ vc(lmax(t), m, vc(max(m,h), r, post(lmax))))∧∃h.∃t.l=h::t)

Now we can use the specs for max and lmax to simplify further:

vc(lmax(t), m, vc(max(m,h), r, post(lmax)))
 
= pre(lmax){t/l} ∧ (post(lmax){t/l, m/r} ⇒ vc(max(m,h), r, post(lmax)))
= t≠nil ∧ ((m∈t∧∀x.x∈t⇒m≥x) ⇒ vc(max(m,h), r, post(lmax)))
= t≠nil ∧ ((m∈t∧∀x.x∈t⇒m≥x) ⇒ (pre(max){m/x,h/y} ∧ post(max){m/x,h/y} ⇒ post(lmax)))
= t≠nil ∧ ((m∈t∧∀x.x∈t⇒m≥x) ⇒ (T ∧ ((r=m ∨ r=h) ∧ r≥m ∧ r≥h) ⇒ post(lmax)))
= t≠nil ∧ ((m∈t∧∀x.x∈t⇒m≥x) ⇒ (((r=m ∨ r=h) ∧ r≥m ∧ r≥h) ⇒ r∈l ∧ ∀x. x∈l ⇒ r≥x))

Putting this back into the second conjunct, we have:

(l≠nil ∧ ¬∃y. l=y::nil) ⇒ ((l=h::t ⇒ t≠nil ∧ ((m∈t∧∀x.x∈t⇒m≥x) ⇒ (((r=m ∨ r=h) ∧ r≥m ∧ r≥h) ⇒ r∈l ∧ ∀x. x∈l ⇒ r≥x)) )∧∃h.∃t.l=h::t)

This can be proved using the datatype axioms for lists.

More reading

Acknowledgements

Michael Clarkson helped develop these notes.