Influences (or Lack Thereof) of Formalism in
Teaching Programming and Software Engineering


Banquet Speech at the YoP Institute
28 October 1987


David Gries

Computer Science, Cornell University


Introduction

This YoP Institute has been devoted to the judicious use of formalism in the conscious pursuit of elegance in our everyday activities in mathematics and programming. We say "judicious" because not everything is formalized; the game is to formalize only as much as necessary --no more and no less-- to make an argument as clear, simple, and elegant as possible. It is a game of economy of thought, concept, and pen. It is a game in which the syntactic proof, consisting of a sequence of syntactic transformations according to given rules, plays a large part. You have seen the game played on sequential programs, concurrent programs, distributed programs, VLSI design, and mathematics itself. Not everyone has used the same theories, but with each speaker the common game of judicious use of formalism has been evident. And we believe that the programmer and software engineer would benefit by playing this game.

Now, in spite of Edsger's claims that the nine speakers of this Institute consist of four Americans, one Scot, one Irish, only one Dutchman (who isn't even a man), and what have you, I am sure you have noticed the dearth of American speakers. The other speakers can split up the single Dutchman whichever way they want, but I alone am the four Americans, and the only reason they let me in was because my degree was earned in Munich, Germany. Not even a Ph.D., but Dr. Rerum Natura, or Dr. rare nut as it is often abbreviated in my absence.

Where are all the Americans in programming and software engineering who play the same game? Of course, all profess the need for simplicity and elegance, but there are few who play our game with the intensity seen here, who concentrate so on method, who strive so hard for mathematical simplicity and elegance.

What one does see in Americans --and I am one-- is a more hurried emphasis on facts and mechanical tools. We are so busy doing the next thing that we don't have time to reflect on the last thing. We see this in the vita of many researchers who write only conference papers. They don't take the time to rewrite, to polish, to submit to a refereed journal where they will be subject to the often-painful anonymous referee report, which can help them grow and mature so much. We see this in the almost universal feeling that program documentation is a pain.

We see the hurriedness in our handwriting. Your typical American has not been taught to and won't take the time to learn to write clearly. The mind-set of America vis-a-vis handwriting is best exemplified by the pad of paper placed for your use on the desks in the lecture hall. It is titled `scribbletex'! The last syllable, `tex', I am told, was chosen for a reason. The company wants to computerize their product, joining Tex and Latex, so that Americans can produce their own scribbled handwriting on screens and laser printers, thus making the computer more end-user-friendly!

Contrast your own handwriting with the remarkably clear foils produced by the speakers this week. Look at their handwritten letters and papers --one has even published his handwritten PhD thesis, as Lecture Notes in Computer Science 200. Realizing that absolute clarity and unambiguity are necessary if formal manipulations are to be performed, and realizing that computers are not yet helpful in this regard, the speakers have consciously worked to improve their writing. And it shows. Moreover, they all carry fountain pens and, to the astonishment of their colleagues, like to talk about them.

I have used a bit of hyperbole in talking about Americans --including myself-- to make a point, and it shouldn't be taken too literally. And yet, there is a germ of truth in my comments, and some pondering on them might be worthwhile.

Educating the programmer

One of the large and difficult tasks we have --abroad as well as here-- is that of educating future computing scientists, computing engineers, programmers, and other scientists, and we should think about the effect the kind of material presented at this YoP Institute could have on programming education. For the speakers at this Institute, the notion of program proof is so ingrained that they can't think otherwise. This does not mean that every line of a program is proved formally, of course, but the ideas have been so assimilated that they have become a habit, like brushing teeth or reading. And it shows: there has been evidence enough in this Institute that this habit can simplify the development and presentation of programs and mathematics.

And yet, our general programming texts and texts in data structures make little use of formalism and program proof. They are fifteen years behind the times, and some are so old-fashioned that it is almost a crime to force the students to read them. I am afraid that too many people with half a mind to write an introductory programming text book do so.

More and more texts for introductory and second-semester programming courses do introduce the term `loop invariant', but these texts often have a negative impact on the students' view of formalism and proof. Although a chapter might be devoted to the loop invariant, the author hasn't assimilated its use himself, and the technique either is not used in the text or is used incorrectly. Consequently, the student receives the impression that the loop invariant is a theoretical idea only, to be forgotten as soon as possible. And this impression is often backed up by the instructor.

A few months ago, I saw a new programming text by a respected computer scientist. (S)he received the PhD some twelve years ago, and (s)he does research in algorithms and complexity. I immediately turned to the presentation of binary search, which often reveals a writer's habits. The presentation began with an invariant, but, halfway through, the question of termination of a loop arose, and immediately a variable found was introduced, was set to true when the desired value was detected in the array, and was introduced into the loop condition. But the loop invariant was not changed to reflect the addition of the new variable, so the proof of correctness in terms of the invariant was absolutely wrong --the proof had nothing to do with the program! In consequence, the author was forced to conclude the presentation with a discussion about "the subtleness" of the algorithm --by which they meant that the job had been botched completely.

This computer scientist, working in algorithms and complexity, had not had enough practice with the formal proof of programs to realize that the proof was completely and obviously full of holes! It was not a subtle mistake, a minor mishap of the kind that happens to all of us, but a reflection of the fact that an 18-year-old tool of utmost importance had not been assimilated. Why? Probably because they have the time: Like the rest of us, they had no time to pause, to reflect, to learn; they were too busy finding the next fact, dealing with students, and performing administration. Like all of us, they were overwhelmed with the rush of things.

But that is not the end of the story. This person exhibited a further characteristic that is all too prevalent in our educational system in computing. Upon being confronted with a nicer algorithm for binary search, they replied that they knew that version, but had presented a different one because it relied more heavily on the students' intuition --you know, the old search-in-the-telephone book metaphor.

I believe it is fundamentally wrong to teach a science like programming by reinforcing the students' intuition when that intuition is inadequate and misguided. On the contrary, our task is to demonstrate that a first intuition is often wrong and to teach the principles, tools, and techniques that will help overcome and change that intuition! Reinforcing inadequate intuitions just compounds the problem.

Naturalness in most endeavors must be learned. For example, consider the new golfer. The swing feels just right, and yet the ball doesn't go where it should. However, after many lessons from a professional, and much practice, the strange movements forced on the golfer by the pro begin to feel `natural', and the golfer finds that the acquired swing is the only way to swing; it is impossible to return to the old way.

The same goes for programming: good habits, based on a serious study of the task, have to be taught and then consciously practiced.

My friend also claimed that, although the presentation of binary search could have been better, the algorithm was correct. But correctness is not the only issue in our field; we must get everything right, including the arguments that compel belief in correctness, and there are right ways and righter ways to present such arguments. As we have seen this week, our arguments, our program documentation, can indeed be a joy to present and listen to, to write and read, and the simpler and more elegant the arguments, the more fun it is.

Not everyone in our field thinks this way about documentation. Let me read to you a paragraph from a paper presented at a conference on software engineering by a computer scientist who works in software engineering and cognitive science. This paper was presented at a Conference on Software Engineering Education, and the proceedings with the same title was edited by N.E. Gibbs and R.E. Fairley (Springer Verlag, 1987).

Documentation (unlike good literature) is boring to write and read because we have so little insight on how people construct facts to understand technical material. We should devise more `intuitive' notations for documentation. By intuitive I mean the nonverbal and partially verbal knowledge about how the software really works. This knowledge can then be used by the reader to construct or reconstruct an understanding of the software. With high resolution dynamic displays we have a new medium in which to rethink our approach to documentation. Together with a more sophisticated view of human memory processes, we may be able to make some significant progress on documentation.

Now, if our teachers all believe that writing is boring, that the object of teaching is to reinforce the students' intuition, that technical problems are to be solved by more intuitive notions of nonverbal and partially verbal knowledge displayed on high resolution dynamic displays, then, I say, we are in trouble!

On software-engineering education

Let me say a bit more about software engineering education. First, I do agree that knowledge of and experience with psychology, management, personal communication, and the like are important. But I believe they are secondary, that the technical problems are of primary importance. Thus, I believe the current emphases within software engineering circles are missing the mark.

The Conference on Software Engineering Education mentioned above contained a description of a curriculum under development at a Masters of Engineering program being developed by the Software Engineering Institute (SEI) in Pittsburgh. The curriculum contained some 35 modules, each of which was to be roughly equivalent to one or two semester hours in an academic semester or one work week in industry.

(A copy of this speech was given to Norman Gibbs of the SEI after the fact, and he responded. The SEI's proposed curriculum on software engineering has changed markedly since the Conference took place --they have distanced themselves quite far from the curriculum presented at the Conference. In fact, their new proposal contains at least one course on formal methods in software engineering. This should be taken into account when reading this speech.)

The core of the program, which every student was to take, consisted of ten technical modules and three management modules. Exactly one of these core modules mentioned program verification; it included a `survey of approaches for reasoning about and certifying software correctness, including testing, program verification, walkthroughs/\%inspections, and simulation'. I estimate that at most five hours would be devoted to issues of program verification within the core of the SEI curriculum!

The core did not mention the predicate calculus, manipulating logical formulae, or developing program and proof hand-in-hand. After attending this Institute, does that make any sense to you?

One might argue that this material should be taken by every computer science undergraduate, that it is prerequisite material. I agree wholeheartedly, but generally this material is not being taught to undergraduates --if it were, the sales of my text Science of Programming would be better. Further, this argument was not made by the authors of the SEI curriculum, and the presentation conveys the impression that this is not their view.

The authors of the SEI-curriculum simply didn't realize that a careful, in-depth study of predicate calculus and formal derivation of algorithms, with proof and program being developed hand-in-hand, can have a marked influence on more than half of the core modules they want to teach --communication techniques, software engineering, software interface engineering, software requirements engineering, software generation, tool building, engineering software evolution, and software quality factors. But such an in-depth study requires at leasta full-semester course, and not only five hours. And this in-depth study should be reinforced by using the techniques and concepts in every other module.

I have made my views known to the SEI many times and have been prepared to cooperate with them on revising the curriculum. As it stands, in my opinion this SEI curriculum has the wrong emphasis. And I am not alone in disagreeing with software engineering curricula. Fred Brooks, who gave the keynote speech at the conference, said that he was "in fundamental disagreement with a good deal of what [was] proposed, described, and practiced" in software engineering, as given by the pre-distributed position papers for the conference. He also said,

If you do not know what to teach in a SE curriculum and, if in putting one together, you find a lot of modules that are short on principles --where one can teach only tools or methodologies or today's practices-- instead of most of those modules teach nothing at all. Instead, encourage the students to spend those hours learning something such as physics, mathematics, accounting, where they do know what to teach.

Teaching rigor in proofs and programs

What should we be teaching to programmers and software engineers? We certainly don't have all the answers. But this Institute should leave you with the impression that our technical problems need technical solutions, that these technical solutions will rest on sound mathematical foundations, and that mastery of the predicate calculus lies at the core of our solutions. Hence, we should be teaching mastery of the predicate calculus before programming; the rules of formal manipulation should be taught and practiced until they have been assimilated. And then we should teach programming in terms of developing program and proof hand-in-hand.

Now, some people question the use of the syntactic proof, which involves pure manipulation of symbols according to given laws. They see it as too cumbersome, complex, and difficult exactly in programming, where there are so many details. They feel such symbol manipulation is best left to the computer. And indeed, students have trouble with formula manipulation. Once they have made a textual substitution, say, they no longer "understand what the formula means" and are hesitant to make further manipulations for fear of making mistakes. Because of this, they equate rigor with rigor mortis, with a stiffening of their abilities.

I have given courses in which I had actually to guide a student's hand as the first textual substitution was made in connection with the assignment-statement axiom. This was depressing, for symbol manipulation is what programming is all about! Such courses, expected to be on the development of programs, had to spend far too much time on the predicate calculus and symbol manipulation. And these were not dumb or inexperienced students; they simply hadn't received a proper education.

We are not alone in our emphasis on formal methods and syntactic proofs. Let me quote the famous mathematician David Hilbert, who first proposed the use of formal proofs, from his lecture at the Second International Congress of Mathematicians in Paris in 1900, in which he outlined his famous 10 problems:

It remains to discuss briefly what general requirements may be justly laid down for a solution of a mathematical problem. I should say first of all, this: that it be possible to establish the correctness of the solution by means of a finite number of steps based upon a finite number of hypotheses that are implied in the statement of the problem and that must be exactly formulated. This requirement of logical deduction by means of a finite number of processes is simply the requirement of rigor in reasoning. Indeed, the requirement of rigor, which has become a byword in mathematics, corresponds to a universal philosophical necessity of our understanding ... only by satisfying this requirement do the thought content and the suggestiveness of the problem attain their full value. ...

It is an error to believe that rigor in the proof is the enemy of simplicity. On the contrary, we find it confirmed in numerous examples that the rigorous method is at the same time the simpler and the more easily comprehended. The very effort for rigor forces us to discover simpler methods of proof. It also frequently leads the way to methods which are more capable of development than the old methods of less rigor. (Quoted from Hilbert, by Constance Reid, Springer Verlag, New York, 1983.)

Thus, it is not just a bunch of computer scientists that are stressing formal methods. The concept was developed some 90 years ago by one of the greatest mathematicians, and it has been pursued from time to time by many, albeit not at the same level of intensity as you have seen this week.

During his time, Hilbert's program for formalizing mathematics received its share of criticism, with some mathematicians objecting to his "reducing the science to a meaningless game played with meaningless marks on paper". However, Hilbert claimed that it was precisely the shuffling of meaningless symbols according to given rules that provided confidence and understanding. We feel the same way about our proofs of programs (and their development). By relying more on the formula manipulation to do the work, we find proofs and programs being much shorter and simpler than they used to be. And we hope that the practical evidence provided by this week's speakers will convince you too.

Conclusion

This is the last of several YoP --Year of Programming, Year of Proof, Year of Predicate Calculus, Year of Parties-- Institutes. All have eschewed nontechnical issues of programming and software engineering and instead have concentrated on technical issues, usually dealing with correctness concerns. Tony Hoare's YoP Institute on concurrency, my YoP Institute on encapsulation, Huet's YoP Institute on logical foundations of functional programming, Gordon and Hunt's YoP Institute on formal specification and verification of hardware, Turner's YoP Institute on declarative programming, and this, Dijkstra's YoP Institute on formal development, all have stressed the formal, theoretical aspects. One can ask whether this mYoPic view of programming can indeed be beneficial, or whether it is only a form of nearsightedness that fails to understand the significance of broader issues. That answer to this question I leave to you, the audience, the audiences of this and the other YoP Institutes. I for one come away with gratitude to the sponsors and organizers of the YoP for giving us this opportunity and with renewed enthusiasm for the use of formal techniques in programming.