Linkable.Analysis
Class SentenceTree

java.lang.Object
  |
  +--Linkable.Analysis.SentenceTree

public class SentenceTree
extends java.lang.Object

The SentenceTree class encapsulates one sentence from the text being analyzed. It contains the text (as a tree of nodes), the references that were found in this sentence, and the actual anchor strings corresponding to the references.


Inner Class Summary
private  class SentenceTree.Link
           
protected  class SentenceTree.Node
           
protected  class SentenceTree.Split
           
 
Field Summary
protected  java.util.Vector anchors
           
protected static int BRACKETS_AROUND_COMMAED_NAMES_AND_YEARS
           
protected static int CURLY_BRACKETS_AROUND_ACRONYMS
           
private  SentenceTree.Node currentTree
           
private static boolean DEBUG
           
private static int ENDTAG
           
private static java.lang.String[] FALSESTARTS
           
private static java.lang.String[] FALSESTOPS
           
protected  boolean hasReferences
           
protected  int hint
           
private static int HREF
           
private static java.lang.String ME
           
protected static int PARENTHESES_AROUND_COMMAED_NAMES_AND_YEARS
           
protected static int PARENTHESES_AROUND_NAMES_AND_YEAR
           
protected  java.util.Vector refsInText
           
private  SentenceTree.Node root
           
protected static int SQUARE_BRACKETS_AROUND_ACRONYMS
           
protected static int SQUARE_BRACKETS_AROUND_NUMERALS
           
private static int TAG
           
private static int TEXT
           
protected  int whichFALSESTOP
           
 
Constructor Summary
protected SentenceTree()
          Constructor that just makes a new tree
protected SentenceTree(int hint)
          Or you can construct a tree with a specific hint in it.
protected SentenceTree(SentenceTree st)
          Or you can construct a tree with the hint already set from previous sentences in this text.
 
Method Summary
protected  void addNode(java.lang.String content)
          adds another node to current subtree.
protected  void append(SentenceTree x)
          adds another SentenceTree object to the end of this one
protected  boolean beginsWith(java.lang.String fs)
          A helper routine to check whether this begins with a false start.
private  java.util.Vector bracketsRef()
          handles references of the form [...].
protected static java.lang.String cleanup(java.lang.String s)
          Trims a string of 's, whitespace, commas, semi-colons, [], and () and return it.
private  java.util.Vector curlyAcronym()
          Return a Vector of all the tags found in this context.
protected  java.lang.String dump()
           
private  java.util.Vector enclosedAcronym(java.lang.String caller, java.lang.String delims, boolean needYear, int hintName)
          Return a vector of all the tags found in this context.
private  java.util.Vector endRef(java.util.Vector result, java.lang.StringBuffer anchorBuf, java.lang.StringBuffer sb)
          Closes off the current reference being built in a list-of-anchors
protected  int endsWith(java.lang.String[] words)
          A helper routine to check whether this really is the end of a sentence.
private  java.lang.String expand(java.lang.String reference, java.lang.String anchor)
          helper function to turn a string like "[1,2,3]" into "[1][2][3]" and [4-6] into [4][5][6] If there are no commas in "reference" the whole string is returned Guaranteed: this is a SQUARE_BRACKET type of reference string Note: As a side effect, the anchors Vector is appended to, once for each expanded reference.
protected  boolean findReferences()
          Looks for references in this Sentence Tree and stores them in a private structure that contains all the info on these links.
protected  java.lang.String[] getAnchors()
          Returns the String array of anchors corresponding to potential links
protected  int getHint()
          returns the hint currently in use for this SentenceTree
protected  java.lang.String getLinks()
          print out the table of potential links
protected  SentenceTree.Node getRoot()
          digs out the node that is at the top of the tree
protected  java.lang.String getTags()
          helper function to get the list of tags just as a string called by getLinks and by XHTMLAnalyzer.
private  java.lang.String isNameAndYear(java.util.StringTokenizer st, java.util.Vector tokens, java.lang.String delim)
          Return a string that contains something like " 1994" or " and MacNeil, 1996" or ", 1999" or ";1999" " name, 1999" or " & Smith, 1999" or "1998a" (that is, the caller has already gobbled up the leading token) "," and ";' and "delim" end a name-and-year element.
private  boolean isNumeric(java.lang.String token)
           
private  boolean isValid(java.lang.String token)
          determines whether this token is a valid reference.
protected  boolean isYear(java.lang.String token)
           
private  int lastWord(java.lang.String s)
          lastWord - Given a string s and in index in the string s, find the place where the last word in that string begins Accept apostrophe, comma and dashes as parts of name
private  java.lang.String loneYear(java.lang.String s, int offset, int length)
          A lone year, e.g.
private  void mergeContexts(java.util.Vector contextTrees)
          Given the existing contextTree and the current sentence, add this sentence to the last context in the vector if there is one, else just start up a new vector
private  java.lang.String nextElement(java.util.StringTokenizer st, java.util.Vector tokens, java.lang.String delim)
          Returns a string that contains something like " and MacNeil" or ", Smith".
private  java.util.Vector parensRef()
          recognizes references of the form "(...
private static java.lang.String removeLowerCaseWords(java.lang.String t)
          Sometimes references have lots of leading words that don't really belong to the anchor.
protected  void reset()
          Since the style of references does not change over the course of a document it is good to keep the hint in place from one instantiation to the next.
private  java.util.Vector squareAcronym()
           
private  java.util.Vector squareRef()
          recognizes references in the form [...]
protected  java.lang.String text()
           
protected  boolean updateContextTrees(java.util.Vector contextTrees)
          Given the existing contextTrees, the current sentence, and settings for whether there are references, false stops, or false starts involved, either append the sentence to the previous context, start a new context, or drop it altogether.
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
 

Field Detail

ME

private static final java.lang.String ME

DEBUG

private static final boolean DEBUG

root

private SentenceTree.Node root

currentTree

private SentenceTree.Node currentTree

hint

protected int hint

TEXT

private static final int TEXT

TAG

private static final int TAG

HREF

private static final int HREF

ENDTAG

private static final int ENDTAG

SQUARE_BRACKETS_AROUND_NUMERALS

protected static final int SQUARE_BRACKETS_AROUND_NUMERALS

PARENTHESES_AROUND_NAMES_AND_YEAR

protected static final int PARENTHESES_AROUND_NAMES_AND_YEAR

SQUARE_BRACKETS_AROUND_ACRONYMS

protected static final int SQUARE_BRACKETS_AROUND_ACRONYMS

PARENTHESES_AROUND_COMMAED_NAMES_AND_YEARS

protected static final int PARENTHESES_AROUND_COMMAED_NAMES_AND_YEARS

BRACKETS_AROUND_COMMAED_NAMES_AND_YEARS

protected static final int BRACKETS_AROUND_COMMAED_NAMES_AND_YEARS

CURLY_BRACKETS_AROUND_ACRONYMS

protected static final int CURLY_BRACKETS_AROUND_ACRONYMS

refsInText

protected java.util.Vector refsInText

anchors

protected java.util.Vector anchors

FALSESTOPS

private static java.lang.String[] FALSESTOPS

FALSESTARTS

private static java.lang.String[] FALSESTARTS

whichFALSESTOP

protected int whichFALSESTOP

hasReferences

protected boolean hasReferences
Constructor Detail

SentenceTree

protected SentenceTree()
Constructor that just makes a new tree

SentenceTree

protected SentenceTree(SentenceTree st)
Or you can construct a tree with the hint already set from previous sentences in this text.
Parameters:
the - Sentence Tree that has references in it

SentenceTree

protected SentenceTree(int hint)
Or you can construct a tree with a specific hint in it.
Parameters:
- - the integer hint
Method Detail

reset

protected void reset()
Since the style of references does not change over the course of a document it is good to keep the hint in place from one instantiation to the next. Everything else gets reset, which makes this tree act as if it had only now been instantiated.

getHint

protected int getHint()
returns the hint currently in use for this SentenceTree

addNode

protected void addNode(java.lang.String content)
adds another node to current subtree.
Parameters:
The - textual content of the node to be added

append

protected void append(SentenceTree x)
adds another SentenceTree object to the end of this one
Parameters:
a - sentence tree whose textual value is to be appended to the textual value of this sentence tree

getRoot

protected SentenceTree.Node getRoot()
digs out the node that is at the top of the tree

findReferences

protected boolean findReferences()
Looks for references in this Sentence Tree and stores them in a private structure that contains all the info on these links.

endsWith

protected int endsWith(java.lang.String[] words)
A helper routine to check whether this really is the end of a sentence.
Parameters:
A - string array of words to check (e.g. etc.)

beginsWith

protected boolean beginsWith(java.lang.String fs)
A helper routine to check whether this begins with a false start. Called only if this sentence follows one that ended with a false start.
Parameters:
- - the string indicating what a false start would be

dump

protected java.lang.String dump()

text

protected java.lang.String text()

getLinks

protected java.lang.String getLinks()
print out the table of potential links

getAnchors

protected java.lang.String[] getAnchors()
Returns the String array of anchors corresponding to potential links

getTags

protected java.lang.String getTags()
helper function to get the list of tags just as a string called by getLinks and by XHTMLAnalyzer.

squareRef

private java.util.Vector squareRef()
recognizes references in the form [...]
Returns:
reference tags as strings, or null if none found, with each reference is enclosed in square brackets. Formats handled: [1,2,3], [5], [5-10], [17,p.73], [1]-[4] Note: as a side effect, fill in the "anchors" vector with the anchor string associated with each normalized reference.

parensRef

private java.util.Vector parensRef()
recognizes references of the form "(... year)". (Besser,1994,Cringley,1996) (Jones et al.,1999) (Jones & Smith, 1999) (Institution name, 1999) (Alvin,1998,Bailey,1999) (Evans et al. 1989a; Jones 1991) Bray (1997) (Hitchcock et al. 1996, 1997a) (TBD) If one of these formats was found and hint is -1, reset hint to PARENTHESES_AROUND_COMMAED_NAMES_AND_YEARS.

curlyAcronym

private java.util.Vector curlyAcronym()
Return a Vector of all the tags found in this context. Recognize things like {FOOBAR} or {ONE, TWO} or {ONE,TWO} Note: this routine is called only if (1) {refs} were found before or (2) hint is -1

enclosedAcronym

private java.util.Vector enclosedAcronym(java.lang.String caller,
                                         java.lang.String delims,
                                         boolean needYear,
                                         int hintName)
Return a vector of all the tags found in this context. The tags are expected to be in the form of where left,right = [] or {} or (). The acronym could also be a commaed list of acronyms. needYear determines whether or not the commaed list should have alternating acronyms and years. The brackets are the first two characters in "delims". The remaining characters are additional parsing tokens. Side effect: if at least one bracketed reference is found in this context, then reset the SentenceTree hint to "hintName". Side effect(TBD): if BRACKETS_AROUND_COMMAED_NAMES_AND_YEARS and there are no years in the context which does, however contain "[...]", reset hint to SQUARE_BRACKETS_AROUND_ACRONYMS

isValid

private boolean isValid(java.lang.String token)
determines whether this token is a valid reference.
Parameters:
the - token

bracketsRef

private java.util.Vector bracketsRef()
handles references of the form [...]. /*@returns a Vector of all the tags found in this context. recognize things like [Besser,1994,Cringley,1996] [Jones et al.,1999] [Jones & Smith, 1999] [Institution name, 1999] [Alvin,1998,Bailey,1999] Also (just square brackets, not round brackets) recognize references like [PRISM] or [Jones, Jones and Jones]. (TBD) Strings like this should cause SQUARE_BRACKETS_AROUND_ACRONYMS Also semi-colons instead of commas If one of these formats was found and hint is -1, reset hint to BRACKETS_AROUND_COMMAED_NAMES_AND_YEARS Side Effect: defines anchors to be the strings that correspond to the references

nextElement

private java.lang.String nextElement(java.util.StringTokenizer st,
                                     java.util.Vector tokens,
                                     java.lang.String delim)
Returns a string that contains something like " and MacNeil" or ", Smith". "," and "delim" end an element. The returned string includes the ending token. Return null if the tokens do not comprise an acronym

isNameAndYear

private java.lang.String isNameAndYear(java.util.StringTokenizer st,
                                       java.util.Vector tokens,
                                       java.lang.String delim)
Return a string that contains something like " 1994" or " and MacNeil, 1996" or ", 1999" or ";1999" " name, 1999" or " & Smith, 1999" or "1998a" (that is, the caller has already gobbled up the leading token) "," and ";' and "delim" end a name-and-year element. The returned string includes the ending token. Return null if the tokens do not comprise a name and year Special case for square brackets only: the year can be null if next token if ']'
Parameters:
A - Tokenizer, carries state of parse so far
List - of tokens for actual anchor string
Either - ")" or "]" to generalize the routine

isYear

protected boolean isYear(java.lang.String token)

isNumeric

private boolean isNumeric(java.lang.String token)

squareAcronym

private java.util.Vector squareAcronym()

cleanup

protected static java.lang.String cleanup(java.lang.String s)
Trims a string of 's, whitespace, commas, semi-colons, [], and () and return it. Suitable for cleanup up individual references. Bracketed tag values are added to refsInText.
Parameters:
String - to be cleaned up.

removeLowerCaseWords

private static java.lang.String removeLowerCaseWords(java.lang.String t)
Sometimes references have lots of leading words that don't really belong to the anchor. Strip off these words.
Parameters:
Proposed - reference anchor

expand

private java.lang.String expand(java.lang.String reference,
                                java.lang.String anchor)
helper function to turn a string like "[1,2,3]" into "[1][2][3]" and [4-6] into [4][5][6] If there are no commas in "reference" the whole string is returned Guaranteed: this is a SQUARE_BRACKET type of reference string Note: As a side effect, the anchors Vector is appended to, once for each expanded reference.
Parameters:
The - reference, like "[4-6]"
The - anchor, like "4-6"

lastWord

private int lastWord(java.lang.String s)
lastWord - Given a string s and in index in the string s, find the place where the last word in that string begins Accept apostrophe, comma and dashes as parts of name

loneYear

private java.lang.String loneYear(java.lang.String s,
                                  int offset,
                                  int length)
A lone year, e.g. "(year)" has been found in the text at offset. The length is 4 for "(year)" and 5 for (yeara) Return null if this is not part of a reference anchor. Otherwise this could be an anchor being used as a part of speech. Return the anchor in the form "[name-list, year]" Side effect: if non-null returned, anchors has been updated with the string "name-list (year)"

endRef

private java.util.Vector endRef(java.util.Vector result,
                                java.lang.StringBuffer anchorBuf,
                                java.lang.StringBuffer sb)
Closes off the current reference being built in a list-of-anchors
Parameters:
Vector - of canonicalized references in this context
StringBuffer - holding the literal anchor(s)
StringBuffer - holding the canonicalized anchors Note that this routine should be called when we have "...year," or ";" or list-of-anchors with no " and "

mergeContexts

private void mergeContexts(java.util.Vector contextTrees)
Given the existing contextTree and the current sentence, add this sentence to the last context in the vector if there is one, else just start up a new vector

updateContextTrees

protected boolean updateContextTrees(java.util.Vector contextTrees)
Given the existing contextTrees, the current sentence, and settings for whether there are references, false stops, or false starts involved, either append the sentence to the previous context, start a new context, or drop it altogether. Append or Add - start a new SentenceTree (return true) Drop - reset and reuse current SentenceTree (return false) As a side effect of calling this routine, the argument vector can be altered. It can come back one element shorter, the same size, or one element longer. That's because we can wind up with contextTrees that contain no references: Add a sentence because it has a potentially false stop. Case 1: Next sentence does not continue. We should delete the last element from contextTrees, but currently do not. Case 2: Next sentence does continue, but contains no references nor does it end with a potential false stop. We should discard this sentence along with the last element from contextTrees. ReferenceSection has been repaired to not gag on contexts which have no references in them.