Nuprl Library Annotation, Prl Seminar, April 5, 2004

Abstract: An integral piece of building a comprehensive library of formal mathematics is ensuring that the content is accessible. We want users to be able to easily find relevant Nuprl objects, through web search or improved library browsing, and help them understand the objects in which they are interested. To this end, we would like to attach annotations to formal objects. These annotations can take a variety of forms, from keywords and synopses, which we expect to collect from system users, to automatically generated information about the link structure of the library and references to supporting content for each object. We will discuss these annotations and the various methods we have been using to obtain them.

Outline:

What am I *not* talking about today?

This is not about automatic natural language generation or producing full textual versions of proof texts.

This is not about producing extensive glosses or discussions of Nuprl objects, proofs, or libraries.

These are both useful additions to the Nuprl library, but what we are talking about today is smaller pieces of content attached to Nuprl objects in order to make them more accessible to search, help explain them, or facilitate library browsing. We will call these things annotations.

Annotations, we hope, will be easier to collect than full texts/glosses.
They will also follow a predictable structure, which will allow us to use them more easily in a wider range of contexts.

User Supplied Annotations
see: http://www.nuprl.org/FDLProject/FDLnotes/XDL_SmallAnnotation.html

* What are they?

English language "snippets" of text
Attach to formal objects - definitions, declarations, or theorems
Collect from users of the library (i.e., YOU!)
Want to define categories for the text so can predict usage of text

Categories:
Type: definition, declaration, theorem
Title: short phrase by which commonly known
many will not have titles

Verbalization: words used to speak LHS of object
usually only definitions/declarations
can refer to variables in object

Summary: short description, generally of RHS
do not just spell out formalism in words
may not be a good summary
may be multiples based on different perspectives
best if doesn't refer to variables, but not a requirement

Last three all limited to single line, force conciseness

Can have additional title and/or verbalization, and up to two additional summaries.

Can have arbitrary comments

Examples:

divides b|a == \exists c:Z.a=b*c
TYPE: definition
TITLE: integer divisibility
VERBALIZATION: a divides b, a divides b evenly
SUMMARY: divides without remainder, divides evenly

chrem_exists \forall r,s:N+ CoPrime(r,s) => (\forall a,b:Z \exists x:Z ((x = a mod r) & (x = b mod s))
TYPE: theorem
TITLE: existential Chinese remainder theorem
SUMMARY: given two coprime numbers r and s, any other pair of integers are equal modulo r and s respectively

gcd_p_sym \forall a,b,y:Z GCD(a;b;y) => GCD(b;a;y)
TYPE: theorem
SUMMARY: the gcd proposition is symmetric

* How do we collect them?

On the web, via a web form, linked from existing Nuprl object pages

Will be published anonymously, but within group will request username to collect and give feedback

Allow for revision after initial submission; will delay posting of material to allow for edits

In short term, at least, will have a post-submission editing phase for quality control, review of process, etc.

"Publishing" will include putting annotations on web, but also wrapping the annotations into the Nuprl object itself, making it broadly available

* How will we present it?

Right now object pages contain:

formal statement of object
links to component objects' pages
def: "Who Cites" object link - comprehensive
thm: proof object

SKETCH EXAMPLE OBJECT PAGES

Can replace these object pages with a more informative page containing the annotations described

Still retain formal object, but secondary for if actually interested

SKETCH EXAMPLE OBJECT PAGE FROM NOTES

Because can anticipate form of annotations, can use in other settings as well:
if referencing an object (e.g., lemma reference within a proof)
currently can show Nuprl object name or theorem statement with annotations, could give title, if exists
verbalizations can be used in glosses or auto generated texts because wrapped into object, could display within Nuprl editor and there are probably other uses we haven't thought of yet....

Graph-Based Annotations

* What are they?

a library is a network of linked objects
now, it is presented as a linear list ordered to maintain formal dependencies

SKETCH STANDARD LIBRARY PAGE

we would like to be able to present the objects in a library so that:
important objects are given prominence
help people browse library
reveal structure of library and dependencies

looked at a graph representation of the num_thy_1 library
each object is a node, dependencies are edges

a preliminary inspection suggests that features in this graph can allow us to infer which objects are:
key general purpose lemmas - referenced by many
special purpose lemmas - referenced by exactly one
main result theorems - reference many, referenced by few
as well as "sections", or groupings of objects - clustering

* How will we collect them?

It remains to verify these findings in more libraries and automate the process of extracting these labels for objects.

Lori has been working on the application of graph algorithms to collections of Nuprl objects

some questions: what lemma refs should be counted? explicit in tactics or in primitive proofs
use theorems, defs, or both?

looked at algorithms for bipartite graphs, hubs and authorities, betweenness able to identify some key features
find clusters of related items to specific objects

* How will we use them?

If we can do this, we:
can use for visualization and browsing
can use to select which objects to annotate
can reorganize the main library presentation pages:

SKETCH EXAMPLE DRAWING FROM NOTES

We can also use this information on individual object pages:
label object if it is "special"
improve quality of "Who Cites" links provided
if a general support lemma, give examples of its usage as such
if exists for single purpose, indicate such and describe supported object
show snapshot of placement of object within library: "who is near me"

ADD TO DRAWING OF OBJECT PAGE

It can also aid the annotation process, by suggesting to annotators which objects are most important to annotate and which can be skipped.

Automatic Equivalence Extraction

* What are they?
theorems that could be naturally construed as definitions
alternative forms of "same" or similar content
"formal thesaurus"

* How will we collect them?

look for theorems using = or <=> where one side could be a lhs of a definition and the other could be a rhs of the same definition, possibly wrapped in conditions or quantifiers

preliminary results show this is a good starting algorithm but could use some improvements

good: \forall a,b:Z a|b & b|a <=> a= +/-b (assoced_reln)
\forall a:Z, b:Z-0 b|a <=> (a rem b) = 0 (divides_iff_rem_zero)

maybe: \forall i,j:Z i i+1<=j (lt_to_le)
\forall x:Z Even(x) <=> ~Odd(x) (even_iff_not_odd)
\forall x:Z Odd(x) <=> ~Even(x) (odd_iff_not_even)

problematic: \forall a:{...0}, n:N+ (a rem n) = -((-a) rem n) (rem_3_to_1)
(same operator on lhs and rhs)

Issues - what if it is easy to derive <=> from => and <=
how big does the scope of quantification have to be?

prime prime(a) == ~a=0 & ~(a~1) & (\forall b,c:Z a|b*c => a|b v a|c)
prime_nat \forall x:N prime(x) <=> 2<=x * ~(\exists i:{2..x-} i|x)

* What will we use them?
"if you're interested in this you might also be interested in that"

SEE SKETCH OF OBJECT PAGE