Teaching Mathematics Through Formalism: A Few Caveats

Torkel Franzen

Swedish Institute of Computer Science

There is much to be said for both the charm and the utility of formal manipulation in reasoning. It is of course possible to experience the utility without the charm, or conversely. As an instance of the latter, T.S.Eliot has testified to the pleasure he found in pushing symbols around when he was taught logic by Bertrand Russell, although he did not connect the symbols with anything in reality, while on the other hand innumerable students and engineers have sweated through dreary calculations quite lacking in charm in order to arrive at presumably useful results. In the happiest applications, though, formal manipulation carries us gracefully to a desirable goal, allowing us as if by magic to reach interesting conclusions in a painless fashion. It is not surprising that mathematicians as well as physicists have put great store and faith in the power of formal manipulation.

To be sure, the bold leaps of people like Euler or Dirac, using old formalisms in new ways with no guarantee that the results make sense, will in most cases - when we don't have the peculiar talents of an Euler or a Dirac - yield poor or nonsensical results. The chief use of formalism must consist in sticking to rules known to be correct, and this is what we try to teach students when we teach them to use various formalisms: to stick to the rules, to let the rules guide them, and to arrive at results formally but without drudgery.

In recent years, the book by David Gries and Fred B. Schneider, _A Logical Approach to Discrete Math_, has forcefully presented the case for consistently using an equational formalism in teaching logic and discrete mathematics, and for stressing the role of formal manipulation in general. This text has been used with good results, and rather large claims have been made by David Gries and others for the general efficacy and usefulness of this formal approach.

My purpose in the following is to present a somewhat dissenting view. Without denying the usefulness and attraction of formalisms, I maintain that we cannot draw any definite conclusions regarding the efficacy of a formal approach to teaching logic and mathematics without testing the degree of understanding exhibited by students in contexts divorced from the classroom use of the equational formalism. It is is very much open to debate whether "a solid understanding of what constitutes a proof and a skill in developing, presenting, and reading proofs" (quoting from the Instructor's manual for the book) is related in any way to facility in formal manipulation, and also whether "teaching rigor calls for carefully laying out rules for syntactic manipulation".

It goes without saying - a phrase not infrequently accompanying the saying of what goes without saying - that my comments in the following are not in any sense authoritative. I have no particular expertise in pedagogy and no profound insights to offer. Essentially, I'm just chewing the fat about his book with David Gries and the rest of you (unfortunately unknown to me, since I'm unable to attend the workshop), in the somewhat blunt but I hope not abrasive style of my earlier comments on a mailing list where I first learned about the book.

My skepticism is based on some very simple considerations. First, testimonials aren't really a good measure of the degree of understanding achieved by students. Enthusiastic and dedicated teaching will always evoke a corresponding response, and naturally enough many students will express themselves in enthusiastic terms about what they have learned through such teaching, whatever the methods used. For a more informative and objective measure of what has been achieved, it is necessary to examine the students from a vantage point that is independent of the particular pedagogical scheme in question. And in the case of the equational logic, and general formal approach, of Gries and Schneider, I cannot help but suspect that the theorems and problems presented in the course have been chosen so as to admit a painless treatment in the terms offered by that scheme. After all, logical manipulation of formulas only takes us so far. In proving a highly non-trivial combinatorial result such as Ramsey's theorem, we are not helped at all by the manipulation of equational formulas, but need, rather, to be able to follow inductive reasoning applied to a complex combinatorial problem.

Ramsey's theorem is not in fact treated in the book, so I will base my further comments - amplifying the general misgivings expressed above - on a variant of a theorem put forward by David Gries as an example of the efficacy of his approach:

THEOREM. (U,<) is well-founded iff the principle of mathematical induction holds for (U,<).
Recall that a subset < of UxU is a well-founded relation iff every non-empty subset M of U has at least one member x which is minimal with respect to <, i.e. such that y<x does not hold for any y. That the principle of mathematical induction holds for (U,<) means that for every subset M of U, if x belongs to M whenever every y<x belongs to M, then M=U.

The theorem is easily proved. In one direction, if < is well-founded, suppose M satisfies the condition that x belongs to M whenever every y<x belongs to M. If M is not all of U, U\M is non-empty and has a minimal element x. But x being minimal in U\M, every y<x belongs to M, so x belongs to M after all. Thus M must be all of U. Conversely, if the principle of mathematical induction holds, suppose a subset M of U has no minimal element. Then for any x, if y belongs to U\M for all y<x, x must also belong to U\M, since otherwise x would be a minimal element of M. So every x belongs to U\M, i.e. M is empty.

Different formulations of the argument are of course possible. What I wish to emphasize is only that the argument is a simple one, and easily formalized in predicate logic, whatever our favorite logical formalism. In informal terms, the theorem is superficial in the sense of essentially just requiring a bit of logical manipulation, not any mathematical construction.

Now the variant of this theorem I have in mind is the following

THEOREM. (U,<) is well-founded iff the principle of definition by recursion holds for (U,<).
Here by the principle of definition by recursion, I mean the following:
For any set V and function H:UxP(UxV)->V, there is a unique function F:U->V satisfying F(x)=H(x,F|{y:y<x}) for every x in U.
The notation F|M used above denotes the restriction of the function F to a subset M of the domain of F.

Now, assume we set ourselves the task of making this theorem evident to our students and to help them prove it. I submit that we will not be helped at all by any equational formalism, or indeed by formal manipulation in general. This is because the theorem lacks the superficiality of the characterization of well-founded relations given in the earlier theorem. Although not difficult by mathematical standards, proving the present theorem requires us to make connections and understand constructions beyond the mere unwinding of definitions.

To test this hypothesis of mine (about the limited usefulness of a training in using equational formalism in this instance) I undertook an entirely unscientific experiment, posing the problem of proving the theorem to a person very well versed in the approach of the Gries-Schneider book, asking him to use what he had learned from that book.

First, let's look at the problem from a set-theoretical point of view. Assuming the principle of definition by recursion to hold, we get pretty immediately that the relation is well-founded. For we can then define a rank function:

rk(x) = the smallest ordinal greater than rk(y) for every y<x
and note that
y<x => rk(y)<rk(x)
implying that < is well-founded.

But of course this argument presupposes quite a lot of set theory, and in fact uses the replacement axiom. So one naturally wonders if there isn't a more elementary argument.

If we assume < to be a transitive relation, there is such an argument. Essentially this argument is used in a variant of the Liar paradox where we introduce an infinite number of statements A1,A2,..; Ai being the statement "Ak is false for every k>i". Suppose some Ai is true. Then Ai+1 is false, but then Ak is true for some k>i+1, so Ai is false after all. So every Ai is false. But then Ai is also true!

This argument can be recast to prove our theorem in one direction, assuming < to admit definitions by recursion, as follows. Suppose P is a subset of U that has no minimal element, and define (by recursion applied to the characteristic function of M):

x belongs to M if and only if for every y<x, if y is in P then y is not in M.
Then for x in P, if x is in M there is a y<x in P (since P has no minimal element) which is not in M. But then there is a z<y in P which is in M, and since z<x we have a contradiction. So no x in P is in M. But then every x in P is in M, so P must be empty.

We thus see that given the definining property of M, the assumption that P has no minimal element, and the transitivity of <, logical manipulation allows us to conclude that P is empty. Finding this definition of M is another matter. It is not apparent that we are at all helped by having any equational or other formalism at hand in this task.

My student of the Gries-Schneider approach having a gift for clever reasoning, he introduced the assumption that < is transitive, and came up with essentially the argument above to establish the theorem in one direction. (I don't know myself if there is any equally elementary argument that does not presuppose that < is transitive.) But, unsurprisingly, the equational formalism was used only in the proof that P is empty, the definition of M being produced in the style deprecated by Gries and Schneider, that of being pulled from a hat. The choice of M was "inspired by the form of the RHS of the principle of mathematical induction".

I don't think it is any criticism of the Gries-Schneider approach that it wasn't used to arrive at M by means of formal manipulation. We can't use formalism to do away with the need for "inspiration".

Now let's consider the converse of the theorem, that any well-founded < admits definition by recursion.

This converse is unobtainable by any substitution of equals for equals. Essentially, a proof of the existence of the function F (the uniqueness then being easily proved by induction) must go as follows. Consider functions f defined on subsets of U with values in V. For such a function we say that R(f) holds if

1) for any x in the domain of f, {y:y<x} is included in the domain of f,
2) for any x in the domain of f, f(x) = H(x,f|{y:y<x})
We next establish that if R(f) and R(g) and x belongs to the intersection of the domains of f and g, then f(x)=f(g). This is proved by induction, using the well-foundedness of <. We then define F=the union of all f such that R(f), which is shown to be a maximal f with the property R(f). Then, to establish that the domain of F is in fact all of U, we prove by induction that every x belongs to the domain of some f such that R(f). (Here, if < is not assumed transitive, we need to use the transitive closure of <.)

In this case the proof of existence of F produced - after some prodding - by my student of the Gries-Schneider approach consisted in the argument that "the set of x for which F(x) is well-defined" is all of U, by induction: given that F(y) is well-defined for every y<x, we get from the definition of F that F(x) is also well-defined.

Such an argument expresses the intuitive grounds for the acceptability of a definition by recursion on a well-founded relation. It is, however, by no means a rigorous proof. It's an expression of the line of thought that we follow in producing a rigorous proof, like the one sketched above. In order to make this distinction clear in the classroom, I would point out that we haven't given any kind of mathematical definition of "F(x) is well-defined", but are reasoning in informal terms about an object - the function F - that we haven't yet proved to exist, using a concept - F(x) being well-defined - that we haven't defined.

Again I don't think it at all a flaw in the Gries-Schneider approach that a student of that approach should mistakenly present an intuitive picture as a rigorous proof. I have myself produced a number of arguments that on inspection turned out to be nonsensical, and in fact to lead nowhere at all, and I consider myself to have been well taught in mathematics. But I think cases such as this suggest - completely unsurprisingly - that claims about a particular approach to the teaching of mathematics yielding "a solid understanding of what constitutes a proof", or a general appreciation of rigor, must be taken with large doses of salt. Mathematics is a subject both deep and wide, and the possibilities of error and confusion are boundless. We can never eliminate them, in ourselves or in others. Ideally, I would like a student to come away from a course in discrete mathematics and beginning set theory with an understanding of both of the above characterizations of well-founded relations, and an appreciation of their difference in mathematical depth.

The caveats alluded to in the title of these remarks are, then, the following.

Whatever particular approach we favor in the teaching of logic and mathematics, we should take into account that it is bound to have limitations and weak spots of which we are most likely unaware. It is a good idea to enlist unsympathetic observers to help us evaluate our approach, and to help us separate the effects of enthusiastic and dedicated teaching from the effects of the particular choices of methods and material.