A Calculational Proof of Andrew's Challenge
Ithaca, NY 14853
At the Marktoberdorf summer school in August 1996, Larry Paulson lectured on his mechanical theorem prover, Isabelle; Natarajan Shankar lectured on his mechanical theorem prover, PVS; and I lectured on calculational logic. Both Paulson and Shankar suggested that I try the calculational approach on Andrew's challenge, which is one of several difficult theorems used to benchmark mechanical theorem provers. Andrew's challenge is to prove the following theorem.
((E x A y |: p.x == p.y) ==
((E x |: q.x) == (A y |: p.y))) ==
((E x A y |: q.x == q.y) == ((E x |: p.x) == (A y |: q.y)))
Remark. We use the notation (A x |: P) instead of the more usual E y . P, for reasons explained in . (E y |: P)) may be abbreviated as (A x E y |: P). Also, we use == for equality over the bools and = for equality over any type (including the bools). Our precedences are, beginning with the tightest, ~ (not), =, $or and $and, => (implication) and <= (consequence), == . Finally, in order to eliminate parentheses, we write p.x instead of p(x) for application of function p to variable x. End of remark.
In proving Andrew's challenge using the calculational approach, I use theorems given in the text \cite (or in its as-yet-unpublished second edition). The Appendix contains theorems used here that may be unfamiliar to the reader.
Now, == is both associative and symmetric, so we can rewrite Andrew's challenge as
P == Qwhere P and Q are defined by the following.
P: (E x A y |: p.x == p.y) == (E x |: p.x) == (A y |: p.y)where it is assumed that this formula is closed (so p.x and q.x contain no free variables other than x).
Q: (E x A y |: q.x == q.y) == (E x |: q.x) == (A y |: q.y)
This form gives the impression that perhaps P is valid (or invalid), regardless of p. If this is the case, then Q is also valid (or invalid). Hence, we try to prove P.
We don't have many theorems that deal with == as they appear in P, so we try to prove P by mutual implication, proving instead
(2) ((E x A y |: p.x == p.y) == (E x |: p.x)) \;\From\; ( A y |: p.y)
(3) ((E x A y |: p.x == p.y) == (E x |: p.x)) \;\Imp\; ( A y |: p.y)
Proof of (2) Assume (A y |: p.y) (E x A y |: p.x == p.y) == (E x |: p.x) = < Assumption, instantiated with y:=x and with y:=y, so p.x == true and p.y == true > (E x A y |: true == true) == (E x |: true) = < Identity of == (5); (A y |: true) == true > (E x |: true) == (E x |: true) ---Reflexivity of == (6) Q.E.D
Proof of (3) (3)} = < Contrapositive, X => Y == ~ Y => ~ X} ~ (A y |: p.y) => ~ ((E xA y |: p.x == p.y) == (E x |: p.x)) = < De Morgan (12) on antecedent; ~(X==Y) == X == ~Y and De Morgan (11) on the consequent > (E y |: ~p.y) => ((E x< strong>A y |: p.x == p.y) == (A x |: ~p.x))
By Metatheorem Witness (13), the last formula is a theorem iff the following one is.
~p.z => ((E x A y |: p.x==p.y) == (A x |: ~p.x))We calculate:
Assume ~ p.z, so also p.z == \False (E x A y |: p.x==p.y) = < Lemma (4) ---heading to change p.x to p.z> (E x |: (A y |: p.x==p.y) and p.x==p.z) =
(E x |: (A y |: p.z==p.y) and p.x==p.z) = < Lemma (4)> (E x A y |: p.z==p.y) = < Assumption p.z==false; false == X == ~X> (E x A y |: ~p.y) = < Provided x doesn't occur free in X, (E x |: X) == X> (A y |: ~p.y) Q.E.D.
(4) Lemma. (A x |: S.x) == (A x |: S.x) and S.t Proof (A x | true: S.x) = < Zero of or (7)> (A x | true or x=t: S.x) = < Range split (10)> (A x | true: S.x) and (A x | x=t: S.x) = < One-point rule (9)> (A x | true: S.x) and S.t Q.E.D.
(5) Identity of ==: true == Q == Q (6) Reflexivity of ==: P == P (7) Zero of or: P or true == true (8) Substitution: X=Y and E[V:=X] == X=Y and E[V:=Y] (9) One-point rule: Provided x does not occur free in E, (A x | x=E: P) = P[x:=E] (10) Range split: (A x | R or S: P) = (A x | R: P) and (A x | S: P) (11) De Morgan: (E x | R: P) == (A x | R: ~P) (12) De Morgan: (A x | R: P) == (E x | R: ~P) (13) Metatheorem Witness: Suppose z does not occur free in P, Q, and R. Then (E x | R: P) => Q is a theorem iff (R and P)[x:=z] => Q is a theorem