Outline:
What am I *not* talking about today?
This is not about automatic natural language generation or producing full textual versions of proof texts.
This is not about producing extensive glosses or discussions of Nuprl objects, proofs, or libraries.
These are both useful additions to the Nuprl library, but what we are talking about today is smaller pieces of content attached to Nuprl objects in order to make them more accessible to search, help explain them, or facilitate library browsing. We will call these things annotations.
Annotations, we hope, will be easier to collect than full texts/glosses.
They will also follow a predictable structure, which will allow us to
use them more easily in a wider range of contexts.
User Supplied Annotations
see: http://www.nuprl.org/FDLProject/FDLnotes/XDL_SmallAnnotation.html
* What are they?
English language "snippets" of text
Attach to formal objects - definitions, declarations, or theorems
Collect from users of the library (i.e., YOU!)
Want to define categories for the text so can predict usage of text
Categories:
Type: definition, declaration, theorem
Title: short phrase by which commonly known
many will not have titles
Verbalization: words used to speak LHS of object
usually only definitions/declarations
can refer to variables in object
Summary: short description, generally of RHS
do not just spell out formalism in words
may not be a good summary
may be multiples based on different perspectives
best if doesn't refer to variables, but not a requirement
Last three all limited to single line, force conciseness
Can have additional title and/or verbalization, and up to two additional summaries.
Can have arbitrary comments
Examples:
divides b|a == \exists c:Z.a=b*c
TYPE: definition
TITLE: integer divisibility
VERBALIZATION: a divides b, a divides b evenly
SUMMARY: divides without remainder, divides evenly
chrem_exists \forall r,s:N+ CoPrime(r,s) =>
(\forall a,b:Z \exists x:Z ((x = a mod r) & (x = b mod s))
TYPE: theorem
TITLE: existential Chinese remainder theorem
SUMMARY: given two coprime numbers r and s, any other pair of integers
are equal modulo r and s respectively
gcd_p_sym \forall a,b,y:Z GCD(a;b;y) => GCD(b;a;y)
TYPE: theorem
SUMMARY: the gcd proposition is symmetric
* How do we collect them?
On the web, via a web form, linked from existing Nuprl object pages
Will be published anonymously, but within group will request username to collect and give feedback
Allow for revision after initial submission; will delay posting of material to allow for edits
In short term, at least, will have a post-submission editing phase for quality control, review of process, etc.
"Publishing" will include putting annotations on web, but also wrapping the annotations into the Nuprl object itself, making it broadly available
* How will we present it?
Right now object pages contain:
formal statement of object
links to component objects' pages
def: "Who Cites" object link - comprehensive
thm: proof object
SKETCH EXAMPLE OBJECT PAGES
Can replace these object pages with a more informative page containing the annotations described
Still retain formal object, but secondary for if actually interested
SKETCH EXAMPLE OBJECT PAGE FROM NOTES
Because can anticipate form of annotations, can use in other settings
as well:
if referencing an object (e.g., lemma reference within a proof)
currently can show Nuprl object name or theorem statement
with annotations, could give title, if exists
verbalizations can be used in glosses or auto generated texts
because wrapped into object, could display within Nuprl editor
and there are probably other uses we haven't thought of yet....
Graph-Based Annotations
* What are they?
a library is a network of linked objects
now, it is presented as a linear list ordered to maintain formal dependencies
SKETCH STANDARD LIBRARY PAGE
we would like to be able to present the objects in a library so that:
important objects are given prominence
help people browse library
reveal structure of library and dependencies
looked at a graph representation of the num_thy_1 library
each object is a node, dependencies are edges
a preliminary inspection suggests that features in this graph can
allow us to infer which objects are:
key general purpose lemmas - referenced by many
special purpose lemmas - referenced by exactly one
main result theorems - reference many, referenced by few
as well as "sections", or groupings of objects - clustering
* How will we collect them?
It remains to verify these findings in more libraries and automate the process of extracting these labels for objects.
Lori has been working on the application of graph algorithms to collections of Nuprl objects
some questions: what lemma refs should be counted? explicit in tactics or
in primitive proofs
use theorems, defs, or both?
looked at algorithms for bipartite graphs, hubs and authorities, betweenness
able to identify some key features
find clusters of related items to specific objects
* How will we use them?
If we can do this, we:
can use for visualization and browsing
can use to select which objects to annotate
can reorganize the main library presentation pages:
SKETCH EXAMPLE DRAWING FROM NOTES
We can also use this information on individual object pages:
label object if it is "special"
improve quality of "Who Cites" links provided
if a general support lemma, give examples of its usage as such
if exists for single purpose, indicate such and describe supported
object
show snapshot of placement of object within library: "who is near me"
ADD TO DRAWING OF OBJECT PAGE
It can also aid the annotation process, by suggesting to annotators which objects are most important to annotate and which can be skipped.
Automatic Equivalence Extraction
* What are they?
theorems that could be naturally construed as definitions
alternative forms of "same" or similar content
"formal thesaurus"
* How will we collect them?
look for theorems using = or <=> where one side could be a lhs of a definition and the other could be a rhs of the same definition, possibly wrapped in conditions or quantifiers
preliminary results show this is a good starting algorithm but could use some improvements
good: \forall a,b:Z a|b & b|a <=> a= +/-b (assoced_reln)
\forall a:Z, b:Z-0 b|a <=> (a rem b) = 0 (divides_iff_rem_zero)
maybe: \forall i,j:Z i
problematic: \forall a:{...0}, n:N+ (a rem n) = -((-a) rem n) (rem_3_to_1)
Issues - what if it is easy to derive <=> from => and <=
prime prime(a) == ~a=0 & ~(a~1) & (\forall b,c:Z a|b*c => a|b v a|c)
* What will we use them?
SEE SKETCH OF OBJECT PAGE
\forall x:Z Even(x) <=> ~Odd(x) (even_iff_not_odd)
\forall x:Z Odd(x) <=> ~Even(x) (odd_iff_not_even)
(same operator on lhs and rhs)
how big does the scope of quantification have to be?
prime_nat \forall x:N prime(x) <=> 2<=x * ~(\exists i:{2..x-} i|x)
"if you're interested in this you might also be interested in that"