|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--Linkable.Analysis.SentenceTree
The SentenceTree class encapsulates one sentence from the text being analyzed. It contains the text (as a tree of nodes), the references that were found in this sentence, and the actual anchor strings corresponding to the references.
Inner Class Summary | |
private class |
SentenceTree.Link
|
protected class |
SentenceTree.Node
|
protected class |
SentenceTree.Split
|
Field Summary | |
protected java.util.Vector |
anchors
|
protected static int |
BRACKETS_AROUND_COMMAED_NAMES_AND_YEARS
|
protected static int |
CURLY_BRACKETS_AROUND_ACRONYMS
|
private SentenceTree.Node |
currentTree
|
private static boolean |
DEBUG
|
private static int |
ENDTAG
|
private static java.lang.String[] |
FALSESTARTS
|
private static java.lang.String[] |
FALSESTOPS
|
protected boolean |
hasReferences
|
protected int |
hint
|
private static int |
HREF
|
private static java.lang.String |
ME
|
protected static int |
PARENTHESES_AROUND_COMMAED_NAMES_AND_YEARS
|
protected static int |
PARENTHESES_AROUND_NAMES_AND_YEAR
|
protected java.util.Vector |
refsInText
|
private SentenceTree.Node |
root
|
protected static int |
SQUARE_BRACKETS_AROUND_ACRONYMS
|
protected static int |
SQUARE_BRACKETS_AROUND_NUMERALS
|
private static int |
TAG
|
private static int |
TEXT
|
protected int |
whichFALSESTOP
|
Constructor Summary | |
protected |
SentenceTree()
Constructor that just makes a new tree |
protected |
SentenceTree(int hint)
Or you can construct a tree with a specific hint in it. |
protected |
SentenceTree(SentenceTree st)
Or you can construct a tree with the hint already set from previous sentences in this text. |
Method Summary | |
protected void |
addNode(java.lang.String content)
adds another node to current subtree. |
protected void |
append(SentenceTree x)
adds another SentenceTree object to the end of this one |
protected boolean |
beginsWith(java.lang.String fs)
A helper routine to check whether this begins with a false start. |
private java.util.Vector |
bracketsRef()
handles references of the form [...]. |
protected static java.lang.String |
cleanup(java.lang.String s)
Trims a string of 's, whitespace, commas, semi-colons, [], and () and return it. |
private java.util.Vector |
curlyAcronym()
Return a Vector of all the tags found in this context. |
protected java.lang.String |
dump()
|
private java.util.Vector |
enclosedAcronym(java.lang.String caller,
java.lang.String delims,
boolean needYear,
int hintName)
Return a vector of all the tags found in this context. |
private java.util.Vector |
endRef(java.util.Vector result,
java.lang.StringBuffer anchorBuf,
java.lang.StringBuffer sb)
Closes off the current reference being built in a list-of-anchors |
protected int |
endsWith(java.lang.String[] words)
A helper routine to check whether this really is the end of a sentence. |
private java.lang.String |
expand(java.lang.String reference,
java.lang.String anchor)
helper function to turn a string like "[1,2,3]" into "[1][2][3]" and [4-6] into [4][5][6] If there are no commas in "reference" the whole string is returned Guaranteed: this is a SQUARE_BRACKET type of reference string Note: As a side effect, the anchors Vector is appended to, once for each expanded reference. |
protected boolean |
findReferences()
Looks for references in this Sentence Tree and stores them in a private structure that contains all the info on these links. |
protected java.lang.String[] |
getAnchors()
Returns the String array of anchors corresponding to potential links |
protected int |
getHint()
returns the hint currently in use for this SentenceTree |
protected java.lang.String |
getLinks()
print out the table of potential links |
protected SentenceTree.Node |
getRoot()
digs out the node that is at the top of the tree |
protected java.lang.String |
getTags()
helper function to get the list of tags just as a string called by getLinks and by XHTMLAnalyzer. |
private java.lang.String |
isNameAndYear(java.util.StringTokenizer st,
java.util.Vector tokens,
java.lang.String delim)
Return a string that contains something like " 1994" or " and MacNeil, 1996" or ", 1999" or ";1999" " name, 1999" or " & Smith, 1999" or "1998a" (that is, the caller has already gobbled up the leading token) "," and ";' and "delim" end a name-and-year element. |
private boolean |
isNumeric(java.lang.String token)
|
private boolean |
isValid(java.lang.String token)
determines whether this token is a valid reference. |
protected boolean |
isYear(java.lang.String token)
|
private int |
lastWord(java.lang.String s)
lastWord - Given a string s and in index in the string s, find the place where the last word in that string begins Accept apostrophe, comma and dashes as parts of name |
private java.lang.String |
loneYear(java.lang.String s,
int offset,
int length)
A lone year, e.g. |
private void |
mergeContexts(java.util.Vector contextTrees)
Given the existing contextTree and the current sentence, add this sentence to the last context in the vector if there is one, else just start up a new vector |
private java.lang.String |
nextElement(java.util.StringTokenizer st,
java.util.Vector tokens,
java.lang.String delim)
Returns a string that contains something like " and MacNeil" or ", Smith". |
private java.util.Vector |
parensRef()
recognizes references of the form "(... |
private static java.lang.String |
removeLowerCaseWords(java.lang.String t)
Sometimes references have lots of leading words that don't really belong to the anchor. |
protected void |
reset()
Since the style of references does not change over the course of a document it is good to keep the hint in place from one instantiation to the next. |
private java.util.Vector |
squareAcronym()
|
private java.util.Vector |
squareRef()
recognizes references in the form [...] |
protected java.lang.String |
text()
|
protected boolean |
updateContextTrees(java.util.Vector contextTrees)
Given the existing contextTrees, the current sentence, and settings for whether there are references, false stops, or false starts involved, either append the sentence to the previous context, start a new context, or drop it altogether. |
Methods inherited from class java.lang.Object |
|
Field Detail |
private static final java.lang.String ME
private static final boolean DEBUG
private SentenceTree.Node root
private SentenceTree.Node currentTree
protected int hint
private static final int TEXT
private static final int TAG
private static final int HREF
private static final int ENDTAG
protected static final int SQUARE_BRACKETS_AROUND_NUMERALS
protected static final int PARENTHESES_AROUND_NAMES_AND_YEAR
protected static final int SQUARE_BRACKETS_AROUND_ACRONYMS
protected static final int PARENTHESES_AROUND_COMMAED_NAMES_AND_YEARS
protected static final int BRACKETS_AROUND_COMMAED_NAMES_AND_YEARS
protected static final int CURLY_BRACKETS_AROUND_ACRONYMS
protected java.util.Vector refsInText
protected java.util.Vector anchors
private static java.lang.String[] FALSESTOPS
private static java.lang.String[] FALSESTARTS
protected int whichFALSESTOP
protected boolean hasReferences
Constructor Detail |
protected SentenceTree()
protected SentenceTree(SentenceTree st)
the
- Sentence Tree that has references in itprotected SentenceTree(int hint)
-
- the integer hintMethod Detail |
protected void reset()
protected int getHint()
protected void addNode(java.lang.String content)
The
- textual content of the node to be addedprotected void append(SentenceTree x)
a
- sentence tree whose textual value is to be appended
to the textual value of this sentence treeprotected SentenceTree.Node getRoot()
protected boolean findReferences()
protected int endsWith(java.lang.String[] words)
A
- string array of words to check (e.g. etc.)protected boolean beginsWith(java.lang.String fs)
-
- the string indicating what a false start would beprotected java.lang.String dump()
protected java.lang.String text()
protected java.lang.String getLinks()
protected java.lang.String[] getAnchors()
protected java.lang.String getTags()
private java.util.Vector squareRef()
private java.util.Vector parensRef()
private java.util.Vector curlyAcronym()
private java.util.Vector enclosedAcronym(java.lang.String caller, java.lang.String delims, boolean needYear, int hintName)
private boolean isValid(java.lang.String token)
the
- tokenprivate java.util.Vector bracketsRef()
private java.lang.String nextElement(java.util.StringTokenizer st, java.util.Vector tokens, java.lang.String delim)
private java.lang.String isNameAndYear(java.util.StringTokenizer st, java.util.Vector tokens, java.lang.String delim)
A
- Tokenizer, carries state of parse so farList
- of tokens for actual anchor stringEither
- ")" or "]" to generalize the routineprotected boolean isYear(java.lang.String token)
private boolean isNumeric(java.lang.String token)
private java.util.Vector squareAcronym()
protected static java.lang.String cleanup(java.lang.String s)
String
- to be cleaned up.private static java.lang.String removeLowerCaseWords(java.lang.String t)
Proposed
- reference anchorprivate java.lang.String expand(java.lang.String reference, java.lang.String anchor)
The
- reference, like "[4-6]"The
- anchor, like "4-6"private int lastWord(java.lang.String s)
private java.lang.String loneYear(java.lang.String s, int offset, int length)
private java.util.Vector endRef(java.util.Vector result, java.lang.StringBuffer anchorBuf, java.lang.StringBuffer sb)
Vector
- of canonicalized references in this contextStringBuffer
- holding the literal anchor(s)StringBuffer
- holding the canonicalized anchors
Note that this routine should be called when we have
"...year," or ";" or list-of-anchors with no " and "private void mergeContexts(java.util.Vector contextTrees)
protected boolean updateContextTrees(java.util.Vector contextTrees)
|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |