uk.ac.soton.harvester
Class DeciterState

java.lang.Object
  |
  +--uk.ac.soton.harvester.DeciterState

public class DeciterState
extends java.lang.Object

deciterstate class defines an object that holds all the state of the deciter especially the hints, the marker offsets.


Field Summary
 int authb
          authb and authe store the beginning and end offsets of the authors sequence of the citation.
 int authe
          authb and authe store the beginning and end offsets of the authors sequence of the citation.
 int dateb
          dateb and datee store the beginning and end offsets of the year substring of the citation.
 int datee
          dateb and datee store the beginning and end offsets of the year substring of the citation.
 int digb
          digb and dige store the beginning and end offsets of the initial numbering string of the citation.
 int dige
          digb and dige store the beginning and end offsets of the initial numbering string of the citation.
 java.lang.String documentid
          documentid holds the id which is passed to the harvester from "The System".
 boolean doHTML
          doHTML is one of a group of booleans that control the format of deciter's output: text, HTML or XML.
 boolean doTXT
          doTXT is one of a group of booleans that control the format of deciter's output: text, HTML or XML.
 boolean doXML
          doXML is one of a group of booleans that control the format of deciter's output: text, HTML or XML.
 int endofdate
          endofdate stores the offset of the first significant character the year substring was matched.
 boolean extended
          extended is a debugging relic which controls whether the original author string is emitted along with the rest of the XML output for immediacy of comparison.
 java.lang.String firstAuthor
          firstAuthor stores the first named author from the splitAuthor() method for subsequent use in a multiCite situation.
 boolean firstNameFirstHint
          firstNameFirstHint declares that the citation style tends to put the first name before the surname, at least after the initial author has been dealt with (surnames always come first for first authors so that you can see the primary sort key).
 boolean hint_Author1
          hint_Author1 declares that a very simple scheme for recognising the extent of an author sequence is in force.
 int issb
          issb and isse store the beginning and end offsets of the year substring of the citation.
 int isse
          issb and isse store the beginning and end offsets of the year substring of the citation.
 java.lang.String line
          line contains the whole citation input line from which the fields are eventually teased.
 int maxi
          maxi is the maximum valid offset that can be used with the charAt() method of the string which is the current line.
(package private)  java.lang.String MDashCiteSep
          MDashCiteSep is the 3-emdash sytring which is used to separate some forms of citation (see multiCiteMDashHint).
 int miscb
          miscb and misce store the beginning and end offsets of the miscellaneous (unused and unrecognised) substring of the citation.
 int misce
          miscb and misce store the beginning and end offsets of the miscellaneous (unused and unrecognised) substring of the citation.
 boolean multiCiteMDashHint
          multiCiteMDashHint declares that the citations of a single author may appear to be grouped together as a single entry.
 boolean multiCiteSharesAuthorHint
          multiCiteSharesAuthorHint declares that the citations of a single author may be grouped together as a single entry.
protected  int nCites
          nCites holds the number of citations processed for the current article.
 boolean noForenameHint
          noForeNameHint declares that it is unlinkely that a forename will be given with the surname.
 java.lang.String notAuthor
          notAuthor is the first potential author-string token which seems to not be an author name.
 int pagb
          pagb and page store the beginning and end offsets of the page range substring of the citation.
 int page
          pagb and page store the beginning and end offsets of the page range substring of the citation.
 int placeb
          placeb and placee store the beginning and end offsets of the place name if this citation corresponds to a book.
 int placee
          placeb and placee store the beginning and end offsets of the place name if this citation corresponds to a book.
 int pubb
          pubb and pube store the beginning and end offsets of the publication (ie journal) substring of the citation.
 int pube
          pubb and pube store the beginning and end offsets of the publication (ie journal) substring of the citation.
 int publishb
           
 int publishe
           
 int titb
          titb and tite store the beginning and end offsets of the title substring of the citation.
 int tite
          titb and tite store the beginning and end offsets of the title substring of the citation.
 int volb
          volb and vole store the beginning and end offsets of the volume substring of the citation.
 int vole
          volb and vole store the beginning and end offsets of the volume substring of the citation.
 int xxxb
          xxxb and xxxe store the beginning and end offsets of the XXX id string of the citation.
 int xxxe
          xxxb and xxxe store the beginning and end offsets of the XXX id string of the citation.
 boolean xxxHint
          xxxHint states that the article is from the XXX archive, ie is a physics preprint publication.
 
Constructor Summary
(package private) DeciterState(java.lang.String id, java.lang.String[] opts)
           
 
Method Summary
 void setNewCitation(java.lang.String line)
           
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

nCites

protected int nCites
nCites holds the number of citations processed for the current article. It is only used to return to the calling environment for information (perhaps for debugging or validation purposes).

extended

public boolean extended
extended is a debugging relic which controls whether the original author string is emitted along with the rest of the XML output for immediacy of comparison.

xxxHint

public boolean xxxHint
xxxHint states that the article is from the XXX archive, ie is a physics preprint publication.

noForenameHint

public boolean noForenameHint
noForeNameHint declares that it is unlinkely that a forename will be given with the surname. Explicitly set when xxxHint is set.

firstNameFirstHint

public boolean firstNameFirstHint
firstNameFirstHint declares that the citation style tends to put the first name before the surname, at least after the initial author has been dealt with (surnames always come first for first authors so that you can see the primary sort key).

multiCiteSharesAuthorHint

public boolean multiCiteSharesAuthorHint
multiCiteSharesAuthorHint declares that the citations of a single author may be grouped together as a single entry. Each 'subentry' is recognised by a new year starting. This hint should be replaced with a more generic recognition scheme..

multiCiteMDashHint

public boolean multiCiteMDashHint
multiCiteMDashHint declares that the citations of a single author may appear to be grouped together as a single entry. Each 'subentry' in fact starts with 3 emdashes as a ditto mark. This hint should be replaced with a more generic recognition scheme..

MDashCiteSep

final java.lang.String MDashCiteSep
MDashCiteSep is the 3-emdash sytring which is used to separate some forms of citation (see multiCiteMDashHint).

hint_Author1

public final boolean hint_Author1
hint_Author1 declares that a very simple scheme for recognising the extent of an author sequence is in force. Author sequences extend up to the first full stop.

digb

public int digb
digb and dige store the beginning and end offsets of the initial numbering string of the citation. (e.g. the '34' of "[34]" or "34.")

dige

public int dige
digb and dige store the beginning and end offsets of the initial numbering string of the citation. (e.g. the '34' of "[34]" or "34.")

authb

public int authb
authb and authe store the beginning and end offsets of the authors sequence of the citation.

authe

public int authe
authb and authe store the beginning and end offsets of the authors sequence of the citation.

dateb

public int dateb
dateb and datee store the beginning and end offsets of the year substring of the citation. (e.g. the '1997' of "(1997)" or "1997b.")

datee

public int datee
dateb and datee store the beginning and end offsets of the year substring of the citation. (e.g. the '1997' of "(1997)" or "1997b.")

endofdate

public int endofdate
endofdate stores the offset of the first significant character the year substring was matched. It is actually the returned value from the DoDate recogniser, subsequently used for the DoAuthors.

titb

public int titb
titb and tite store the beginning and end offsets of the title substring of the citation.

tite

public int tite
titb and tite store the beginning and end offsets of the title substring of the citation.

pagb

public int pagb
pagb and page store the beginning and end offsets of the page range substring of the citation. (e.g. the '19--27')

page

public int page
pagb and page store the beginning and end offsets of the page range substring of the citation. (e.g. the '19--27')

pubb

public int pubb
pubb and pube store the beginning and end offsets of the publication (ie journal) substring of the citation. (e.g. the 'CACM' or "Journal of New Politics")

pube

public int pube
pubb and pube store the beginning and end offsets of the publication (ie journal) substring of the citation. (e.g. the 'CACM' or "Journal of New Politics")

volb

public int volb
volb and vole store the beginning and end offsets of the volume substring of the citation.

vole

public int vole
volb and vole store the beginning and end offsets of the volume substring of the citation.

issb

public int issb
issb and isse store the beginning and end offsets of the year substring of the citation.

isse

public int isse
issb and isse store the beginning and end offsets of the year substring of the citation.

miscb

public int miscb
miscb and misce store the beginning and end offsets of the miscellaneous (unused and unrecognised) substring of the citation. This may be a substantial region for a book citation, or may hoover up whole citations if the Adobe hyphenated column bug is in operation or if unrecognised multicites have occurred.

misce

public int misce
miscb and misce store the beginning and end offsets of the miscellaneous (unused and unrecognised) substring of the citation. This may be a substantial region for a book citation, or may hoover up whole citations if the Adobe hyphenated column bug is in operation or if unrecognised multicites have occurred.

publishb

public int publishb

publishe

public int publishe

placeb

public int placeb
placeb and placee store the beginning and end offsets of the place name if this citation corresponds to a book.

placee

public int placee
placeb and placee store the beginning and end offsets of the place name if this citation corresponds to a book.

xxxb

public int xxxb
xxxb and xxxe store the beginning and end offsets of the XXX id string of the citation. (e.g. 'hep-th/9907001'). This is only used if xxxHint is in operation.

xxxe

public int xxxe
xxxb and xxxe store the beginning and end offsets of the XXX id string of the citation. (e.g. 'hep-th/9907001'). This is only used if xxxHint is in operation.

line

public java.lang.String line
line contains the whole citation input line from which the fields are eventually teased.

maxi

public int maxi
maxi is the maximum valid offset that can be used with the charAt() method of the string which is the current line. It corresponds to length()-1.

documentid

public java.lang.String documentid
documentid holds the id which is passed to the harvester from "The System".

notAuthor

public java.lang.String notAuthor
notAuthor is the first potential author-string token which seems to not be an author name. This is internal used by the splitAuthor() and doAuthor() methods.

firstAuthor

public java.lang.String firstAuthor
firstAuthor stores the first named author from the splitAuthor() method for subsequent use in a multiCite situation.

doTXT

public boolean doTXT
doTXT is one of a group of booleans that control the format of deciter's output: text, HTML or XML.

doHTML

public boolean doHTML
doHTML is one of a group of booleans that control the format of deciter's output: text, HTML or XML.

doXML

public boolean doXML
doXML is one of a group of booleans that control the format of deciter's output: text, HTML or XML.
Constructor Detail

DeciterState

DeciterState(java.lang.String id,
             java.lang.String[] opts)
Method Detail

setNewCitation

public void setNewCitation(java.lang.String line)