addText is given a hunk of text from the author section of a
paper, which is not a header as determined by the parser,
which is parsed into one or more authors names.
The Author class is a utility for parsing author names into an Author
structure, for returning parts of names, printing names, and testing
author names for equality.
buildCitationList -
Return a vector of Citation objects currently known for this item
This will involved calls on the citeref database, which is
indexed by document URN.
This class encapsulates information belonging to a Citation, that is,
information about a work that has this document as a reference,
how this document was cited, and in what context
helper function to turn a string like "[1,2,3]" into
"[1][2][3]" and [4-6] into [4][5][6]
If there are no commas in "reference" the whole string is returned
Guaranteed: this is a SQUARE_BRACKET type of reference string
Note: As a side effect, the anchors Vector is appended to, once for
each expanded reference.
getLinkedText emits XML for the linked body of the text and/or the
characters of the text body followed by reference-link data suitable
for separate presentation.
getLinkedText emits XML for the linked body of the text and/or the
characters of the text body followed by reference-link data suitable
for separate presentation.
Given a string of text, parse it as possibly being the start of the
document body, else -- if grabAuthor is true -- parse it as an author
name or author list.
isEndOfAuthorSection examines this hunk of text, which should
be a header (as determined by the parser) and returns true if
this could be the start of the body of the text.
Return a string that contains something like " 1994"
or " and MacNeil, 1996" or ", 1999" or ";1999"
" name, 1999" or " & Smith, 1999" or "1998a"
(that is, the caller has already gobbled up the leading token)
"," and ";' and "delim" end a name-and-year element.
lastWord -
Given a string s and in index in the string s, find the
place where the last word in that string begins
Accept apostrophe, comma and dashes as parts of name
Given the existing contextTree and the current sentence, add
this sentence to the last context in the vector if there is
one, else just start up a new vector
recovers from a situation where the context cannot be found
in the document starting at position k probably because it
keeps running into tagged elements.
returns XLink elements or null for each Reference in the list
note that XLink elements may contain multiple URLs
They each contain "****" where the anchor (the reference in text)
is supposed to go.
save - write the fileIndex hashtable to file
As a side effect, this routine will make sure there actually
is a Surrogate in storage before storing a record that involves
it.
gets the URL of the Item to be analyzed, and proceeds
to fill up local structures, partially cooked in some cases,
the contents of which can be returned on demand by the Surrogate
constructor.
Constructor - make a surrogate for the item at the local address
specified by the first string, with the network address in the
second string (needed for processing local copies of archives)
This class represents an exception thrown by the Surrogate if there
problems with any of its methods, or if an internal error occurred
that needs to be diagnosed.
returns DublinCore XML string for this creation - we don't know displayID?
However, if this Creation corresponds to an archive item that is being
analyzed, then we should know that one URL.
Given the existing contextTrees, the current sentence, and settings
for whether there are references, false stops, or false starts
involved, either append the sentence to the previous context,
start a new context, or drop it altogether.