A Calculational Proof of Andrew's Challenge

David Gries

Computer Science
Cornell University
Ithaca, NY 14853

August 1966
(Supported by NSF grants CDA-9214957 and CCR-9503319.)

At the Marktoberdorf summer school in August 1996, Larry Paulson lectured on his mechanical theorem prover, Isabelle; Natarajan Shankar lectured on his mechanical theorem prover, PVS; and I lectured on calculational logic. Both Paulson and Shankar suggested that I try the calculational approach on Andrew's challenge, which is one of several difficult theorems used to benchmark mechanical theorem provers. Andrew's challenge is to prove the following theorem.

((E x A y |: p.x == p.y) == ((E x |: q.x) == (A y |: p.y))) ==
((E x A y |: q.x == q.y) == ((E x |: p.x) == (A y |: q.y)))

Remark. We use the notation (A x |: P) instead of the more usual E y . P, for reasons explained in [1]. (E y |: P)) may be abbreviated as (A x E y |: P). Also, we use == for equality over the bools and = for equality over any type (including the bools). Our precedences are, beginning with the tightest, ~ (not), =, $or and $and, => (implication) and <= (consequence), == . Finally, in order to eliminate parentheses, we write p.x instead of p(x) for application of function p to variable x. End of remark.

In proving Andrew's challenge using the calculational approach, I use theorems given in the text \cite[1] (or in its as-yet-unpublished second edition). The Appendix contains theorems used here that may be unfamiliar to the reader.

Now, == is both associative and symmetric, so we can rewrite Andrew's challenge as

P == Q
where P and Q are defined by the following.
P: (E x A y |: p.x == p.y) == (E x |: p.x) == (A y |: p.y)
Q: (E x A y |: q.x == q.y) == (E x |: q.x) == (A y |: q.y)
where it is assumed that this formula is closed (so p.x and q.x contain no free variables other than x).

This form gives the impression that perhaps P is valid (or invalid), regardless of p. If this is the case, then Q is also valid (or invalid). Hence, we try to prove P.

We don't have many theorems that deal with == as they appear in P, so we try to prove P by mutual implication, proving instead

(2) ((E x A y |: p.x == p.y) == (E x |: p.x)) \;\From\; ( A y |: p.y)
(3) ((E x A y |: p.x == p.y) == (E x |: p.x)) \;\Imp\; ( A y |: p.y)

Proof of (2)
    Assume (A y |:  p.y)
       (E x A y |: p.x == p.y) == (E x |: p.x)
    =     < Assumption, instantiated with y:=x and with y:=y,
            so p.x == true and p.y == true >
       (E x A y |: true == true) == (E x |: true)
    =     < Identity of == (5); (A y |: true) == true >
       (E x |: true) == (E x |: true)   ---Reflexivity of  == (6)
Q.E.D
Proof of (3)
       (3)}
    =     < Contrapositive, X => Y  ==  ~ Y => ~ X}
       ~ (A y |: p.y)  =>  ~ ((E xA y |: p.x == p.y) == (E x |: p.x))
    =     < De Morgan (12) on antecedent;
            ~(X==Y) == X == ~Y and De Morgan (11) on the consequent >
       (E y |: ~p.y) => ((E x< strong>A y |: p.x == p.y) == (A x |: ~p.x))

By Metatheorem Witness (13), the last formula is a theorem iff the following one is.

   ~p.z => ((E x A y |: p.x==p.y) == (A x |: ~p.x))
We calculate:

   Assume ~ p.z, so also p.z == \False
       (E x A y |: p.x==p.y)
    =      < Lemma (4) ---heading to change p.x to p.z>
       (E x |: (A y |: p.x==p.y) and p.x==p.z)
    =      
       (E x |: (A y |: p.z==p.y) and p.x==p.z)
    =      < Lemma (4)>
       (E x A y |: p.z==p.y)
    =      < Assumption p.z==false; false ==  X ==  ~X>
       (E x A y |: ~p.y)
    =      < Provided x doesn't occur free in X, (E x |: X) == X>
       (A y |: ~p.y)
Q.E.D.
(4) Lemma. (A x |: S.x) == (A x |: S.x) and S.t

Proof       (A x | true: S.x)
          =    < Zero of or (7)>
            (A x | true or x=t: S.x)
          =    < Range split (10)>
            (A x | true: S.x) and (A x | x=t: S.x)
          =    < One-point rule (9)>
            (A x | true: S.x) and S.t
Q.E.D.

Reference

Gries, D., and F.B. Schneider. A Logical Approach to Discrete Math. Springer Verlag, NY, 1993.

Appendix. Some of the theorems used in the proof

(5)  Identity of ==:  true == Q == Q

(6)  Reflexivity of ==:  P == P

(7)  Zero of or:  P or true == true

(8)  Substitution:  X=Y and E[V:=X] == X=Y and E[V:=Y]

(9)  One-point rule:  Provided x does not occur free in E,
     (A x | x=E: P) = P[x:=E]

(10) Range split:  (A x | R or S: P) = (A x | R: P) and (A x | S: P)

(11) De Morgan:  (E x | R: P) == (A x | R: ~P)

(12) De Morgan:  (A x | R: P) == (E x | R: ~P)

(13) Metatheorem Witness:  Suppose z does not occur free in P, Q, and R.
     Then (E x | R: P) => Q is a theorem iff (R and P)[x:=z] => Q is a theorem