Outline:

What am I *not* talking about today?

This is not about automatic natural language generation or producing full textual versions of proof texts.

This is not about producing extensive glosses or discussions of Nuprl objects, proofs, or libraries.

These are both useful additions to the Nuprl library, but what we are talking about today is smaller pieces of content attached to Nuprl objects in order to make them more accessible to search, help explain them, or facilitate library browsing. We will call these things annotations.

Annotations, we hope, will be easier to collect than full texts/glosses.

They will also follow a predictable structure, which will allow us to
use them more easily in a wider range of contexts.

User Supplied Annotations

see: http://www.nuprl.org/FDLProject/FDLnotes/XDL_SmallAnnotation.html

* What are they?

English language "snippets" of text

Attach to formal objects - definitions, declarations, or theorems

Collect from users of the library (i.e., YOU!)

Want to define categories for the text so can predict usage of text

Categories:

Type: definition, declaration, theorem

Title: short phrase by which commonly known

many will not have titles

Verbalization: words used to speak LHS of object

usually only definitions/declarations

can refer to variables in object

Summary: short description, generally of RHS

do not just spell out formalism in words

may not be a good summary

may be multiples based on different perspectives

best if doesn't refer to variables, but not a requirement

Last three all limited to single line, force conciseness

Can have additional title and/or verbalization, and up to two additional summaries.

Can have arbitrary comments

Examples:

divides b|a == \exists c:Z.a=b*c

TYPE: definition

TITLE: integer divisibility

VERBALIZATION: a divides b, a divides b evenly

SUMMARY: divides without remainder, divides evenly

chrem_exists \forall r,s:N+ CoPrime(r,s) =>
(\forall a,b:Z \exists x:Z ((x = a mod r) & (x = b mod s))

TYPE: theorem

TITLE: existential Chinese remainder theorem

SUMMARY: given two coprime numbers r and s, any other pair of integers
are equal modulo r and s respectively

gcd_p_sym \forall a,b,y:Z GCD(a;b;y) => GCD(b;a;y)

TYPE: theorem

SUMMARY: the gcd proposition is symmetric

* How do we collect them?

On the web, via a web form, linked from existing Nuprl object pages

Will be published anonymously, but within group will request username to collect and give feedback

Allow for revision after initial submission; will delay posting of material to allow for edits

In short term, at least, will have a post-submission editing phase for quality control, review of process, etc.

"Publishing" will include putting annotations on web, but also wrapping the annotations into the Nuprl object itself, making it broadly available

* How will we present it?

Right now object pages contain:

formal statement of object

links to component objects' pages

def: "Who Cites" object link - comprehensive

thm: proof object

SKETCH EXAMPLE OBJECT PAGES

Can replace these object pages with a more informative page containing the annotations described

Still retain formal object, but secondary for if actually interested

SKETCH EXAMPLE OBJECT PAGE FROM NOTES

Because can anticipate form of annotations, can use in other settings
as well:

if referencing an object (e.g., lemma reference within a proof)

currently can show Nuprl object name or theorem statement
with annotations, could give title, if exists

verbalizations can be used in glosses or auto generated texts
because wrapped into object, could display within Nuprl editor
and there are probably other uses we haven't thought of yet....

Graph-Based Annotations

* What are they?

a library is a network of linked objects

now, it is presented as a linear list ordered to maintain formal dependencies

SKETCH STANDARD LIBRARY PAGE

we would like to be able to present the objects in a library so that:

important objects are given prominence

help people browse library

reveal structure of library and dependencies

looked at a graph representation of the num_thy_1 library

each object is a node, dependencies are edges

a preliminary inspection suggests that features in this graph can
allow us to infer which objects are:

key general purpose lemmas - referenced by many

special purpose lemmas - referenced by exactly one

main result theorems - reference many, referenced by few

as well as "sections", or groupings of objects - clustering

* How will we collect them?

It remains to verify these findings in more libraries and automate the process of extracting these labels for objects.

Lori has been working on the application of graph algorithms to collections of Nuprl objects

some questions: what lemma refs should be counted? explicit in tactics or
in primitive proofs

use theorems, defs, or both?

looked at algorithms for bipartite graphs, hubs and authorities, betweenness
able to identify some key features

find clusters of related items to specific objects

* How will we use them?

If we can do this, we:

can use for visualization and browsing

can use to select which objects to annotate

can reorganize the main library presentation pages:

SKETCH EXAMPLE DRAWING FROM NOTES

We can also use this information on individual object pages:

label object if it is "special"

improve quality of "Who Cites" links provided

if a general support lemma, give examples of its usage as such

if exists for single purpose, indicate such and describe supported
object

show snapshot of placement of object within library: "who is near me"

ADD TO DRAWING OF OBJECT PAGE

It can also aid the annotation process, by suggesting to annotators which objects are most important to annotate and which can be skipped.

Automatic Equivalence Extraction

* What are they?

theorems that could be naturally construed as definitions

alternative forms of "same" or similar content

"formal thesaurus"

* How will we collect them?

look for theorems using = or <=> where one side could be a lhs of a definition and the other could be a rhs of the same definition, possibly wrapped in conditions or quantifiers

preliminary results show this is a good starting algorithm but could use some improvements

good: \forall a,b:Z a|b & b|a <=> a= +/-b (assoced_reln)

\forall a:Z, b:Z-0 b|a <=> (a rem b) = 0 (divides_iff_rem_zero)

maybe: \forall i,j:Z i

\forall x:Z Even(x) <=> ~Odd(x) (even_iff_not_odd)

\forall x:Z Odd(x) <=> ~Even(x) (odd_iff_not_even)

problematic: \forall a:{...0}, n:N+ (a rem n) = -((-a) rem n) (rem_3_to_1)

(same operator on lhs and rhs)

Issues - what if it is easy to derive <=> from => and <=

how big does the scope of quantification have to be?

prime prime(a) == ~a=0 & ~(a~1) & (\forall b,c:Z a|b*c => a|b v a|c)

prime_nat \forall x:N prime(x) <=> 2<=x * ~(\exists i:{2..x-} i|x)

* What will we use them?

"if you're interested in this you might also be interested in that"

SEE SKETCH OF OBJECT PAGE