Informal Discussion of the Study to Create a Corpus of Human-Generated Texts

Performing the Study

As the first step of the project to generate text from Nuprl proofs, we built a corpus of English language proofs translated by people from Nuprl proofs. This corpus is based on fourteen formal proofs, all of which prove basic facts about integer arithmetic. Each proof has a set of English versions of the proof associated with it. These texts were written by study participants who ranged from being experts in the Nuprl system to having never used the system or seen its formal language before. However, all participants had at least a basic familiarity with formal and typed languages in general. A brief tutorial and guide to the meaning of terms in the proof was available for novice participants to consult while performing the translations. These guides and the Nuprl proofs used are available in the study. The information required to understand the study are reproduced below.

The Nuprl System

The Nuprl system is a proof assistant which helps users create formal proofs of theorems. Within the Nuprl language, one defines mathematical objects and then states theorems to prove about these objects. The language which Nuprl uses is typed; that is, every mathematical object is explicitly required to have a type.

The structure that Nuprl uses for its formal proofs is a sequent style tactic structure. In order to prove a theorem which has been entered into Nuprl, the user specifies a sequence of tactics to apply. A tactic is a strategy for proving the current goal given the current hypotheses. The user specifies a tactic to apply and the tactic runs on the current goal and hypotheses to create a set of new subgoals. The subgoals created are labeled as "main", "aux" for auxiliary or "wf" for well-formedness (this is a subset of the "aux" subgoals). Intuitively, auxiliary subgoals prove supplementary facts which are needed for the main branch of the proof to hold and well-formedness subgoals prove that the statements made respect the type information given for the theorem (this is necessary because Nuprl uses a typed language). The user then specifies tactics to use to prove the new subgoals. This process is repeated until every subgoal is proved, in which case the entire theorem is proved.

Nuprl also has objects called conversions which can be used to rewrite clauses or parts of clauses. Conversions are rewrite rules that take a term and rewrite it to an equivalent term. These may be used to manipulate hypotheses to facilitate matching. Once a conversion has been defined, tactics are used to run these conversions and perform the actual rewriting.

The user often knows a sequence of tactics that they wish to apply, such as knowing that once they apply tactic T they want to apply tactic "Auto" to the auxiliary subgoals in order to prove them automatically as they will be simple enough to prove in this manner (this occurs often). Nuprl allows the user to combine the application of tactics, for example by using the "THEN" tactic which takes two tactics as arguments, one to apply first to the current goal and the second to apply to all of the generated subgoals. The only subgoals which are shown are those that remain to be proved after all of the listed tactics are run one the specified subgoals.

Nuprl allows the user to make reference to previously proved theorems, or, as Nuprl refers to them, lemmas. A lemma is invoked by using the name under which the theorem was proved as an argument to a tactic that acts on lemmas. The formal statement of the lemma does not need to be given again. Similarly, mathematical objects such as functions do not need to be defined within a proof but rather are defined outside of proofs and given names. In order to use the definition of such an object, one uses the name under which it was defined as an argument to a relevant tactic.

The participants were asked to read a Nuprl proof and then write English language version of the proof. The study was administered electronically as a web site that participants visited. They were able to perform the tasks at their own pace and were not required to do the study all in one sitting. Fifteen proofs were available for translation in the study but participants were not required to translate all of the proofs. All of the proofs concerned elementary facts about integers and integer arithmetic. The theorems and their corresponding proofs were all simple enough that the mathematics presented would be straightforward for the participants to understand. Because the study was done over the web, the Nuprl proofs were presented in their html format. Beyond ease of incorporating the proofs into the overall study, this format also seems to allow the most ease in navigating through the proof.

Not all of the participants were familiar with the Nuprl system and its format for presenting proofs. In order to help the novice participants, a description of the system was given as background. An explanation was then given of how to read a Nuprl proof and a small example proof was explained. This explanation was written so as only to show how to navigate through the proof; a translation of the proof was not given. Finally, a listing of the definitions of all of the Nuprl tactics used in the fifteen proofs of the study was given. Links to all three of the information pages were put on each page of the study so that the participants could jump back to them at any time during the study and novices were recommended to open a second browser window to the page of tactic definitions while they were doing the study in order to help understand the proofs. The intention was to minimize the effort necessary for the participants to understand the mathematics of the proofs so that the could concentrate on the text generation process.

They participants were also be asked to answer a small number of biographical questions and some questions about their experience in translating the proofs. These questions were used to determine the range of experience of the participants with mathematics in general and with Nuprl in general and spot any unexpected difficulties in the translation process.

Study Results

There were 9 total participants in the study; their experience with Nuprl ranged from being a developer of the system to having never used it before. In total, the resultant corpus has 52 translated proofs in it. While the novices reported slightly more difficulty in reading and understanding the proofs than Nuprl experts, the overall quality of the proof texts was the same for writers of all levels of expertise.

By studying the corpus of translated proofs, we were able to observe some interesting patterns which became the framework of our natural language generation system. The first observation we made about the corpus was that there was a high degree of regularity in what information people decided was relevant to mention and how they organized it that information into sentences. While the actual words chosen varied, the content did not vary much. In terms of the structure of a natural language generation system, the macro-planning procedure did not vary across people, though the micro-planning was not constant.

The second observation was that the granularity at which proof steps are identified grouped into sentences in English was roughly equivalent to the size of the steps taken by the tactics in the Nuprl proofs. In most cases, each step in the Nuprl proof was described in its own sentence in the translated proof. In a small number of cases a Nuprl proof step was omitted; these were trivial steps in the proof. In some other cases, where two adjacent proof steps both translated the current goal into a new goal statement, the two translation methods were described in one sentence and only the final result of both translations was mentioned. It was never the case that a single proof step was described in many sentences worth of detail, or that many proof steps were summarized into only one sentence.

Given that the people translating the proofs treated the proofs in this way, we have an empirical justification for a useful simplification for our generation system. Rather than having to consider the proof as a whole during the planning process, we are able to plan what content will be contained in each sentence on a local level, looking only at a particular proof node, its parent and its children.

The third observation we made is that many of the sentences in the corpus had similar types content to each other, and these similar sentences had similar types of inference steps being performed in their corresponding proof nodes. By grouping all of the sentences which were similar in this way together and looking at the type of inference being performed, we were able to classify all of the proof steps encountered into ten categories. All instances proof steps in a given category result in similar utterances communicating the same content. The categories we were observed were:

• Statement of Goal - communicates the theorem to be proved
• Variable Declaration - communicates the names of any variables introduced in the proof and possibly their types
• Case Statement - communicates that proof will proceed by argument by cases
• Case Consideration - introduces one of the cases in an argument by cases, possibly stating the assumption made in that case
• Trivial Case Consideration - introduces a case which can be proved trivially in an argument by cases
• Contradiction - communicates that the conclusion is proved by virtue of a contradiction being reached
• Inference Step - communicates that the reasoning proceeds by an inference on the conclusion or one of the hypotheses, a generic type of reasoning not covered by one of the previous categories
• Transformation - communicates a chain of reasoning steps, usually from the inference step category
• Trivial Step - an inference step which is either omitted entirely or where "math" or "obviousness" are given as justification
• Statement of Conclusion - communicates that the conclusion has been proved, possibly with a restatement of that conclusion

For some of these categories, the similarity between all of the sentences in the corpus which fell into that category was so strong that we were able to write a small number of regular expressions which generated every one of those English sentences. Even in the categories with more variation in the sentences produced most of them followed a regular pattern with only a small number of outliers. If we can then identify what category a given proof step in the formal proof falls in, we will be able to generate a sentence which communicates the necessary information and which is representative of the type of sentence which a person writing the proof would write.

All of these observations together point to the conclusion that the language of mathematics and mathematical proofs is a domain in which natural language generation has the potential for successes and a useable system might be created. There are many aspects of the planning process which can be simplified, not by virtue of heuristics, but by taking advantage of regularities observed in actual English language translations of formal proofs.