|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--uk.ac.soton.harvester.Deciter
deciter class does all the significant work in decoding a set of citations
Field Summary | |
(package private) static int |
AUTHORS
AUTHORS is the index of the object in the AttributeMarkers array that recognises the position of the authors in the citation string. |
(package private) static int |
DATE
DATE is the index of the object in the AttributeMarkers array that recognises the position of the date in the citation string. |
(package private) static int |
EXTRA
EXTRA is the index of the object in the AttributeMarkers array that recognises the position of any extra features (e.g. |
(package private) static int |
N_AMS
N_AMS is the number of AttributeMarkers that are used. |
(package private) static int |
NUMBERING
NUMBERING is the index of the object in the AttributeMarkers array that recognises any initial preporocessing before the recognition proper gets underway. |
(package private) static int |
PAGERANGE
PAGERANGE is the index of the object in the AttributeMarkers array that recognises the position of the pagerange in the citation string. |
(package private) static int |
PLACE
PLACE is the index of the object in the AttributeMarkers array that recognises the position of the place of publication in the citation string. |
(package private) static int |
POSTPROCESS
POSTPROCESS is the index of the object in the AttributeMarkers array that performs any subsequent postprocessing and rationalisation of the marker values. |
(package private) static int |
PREPROCESS
PREPROCESS is the index of the object in the AttributeMarkers array that performs any initial preprocessing before the recognition proper gets underway. |
(package private) static int |
PUBLICATION
PUBLICATION is the index of the object in the AttributeMarkers array that recognises the position of the journal title in the citation string. |
(package private) static int |
PUBLISH
PUBLISH is the index of the object in the AttributeMarkers array that recognises the position of the publisher in the citation string. |
(package private) static int |
TITLE
TITLE is the index of the object in the AttributeMarkers array that recognises the position of the title in the citation string. |
(package private) static int |
VOLUMEISSUE
VOLUMEISSUE is the index of the object in the AttributeMarkers array that recognises the position of the volume and issue in the citation string. |
Constructor Summary | |
(package private) |
Deciter(java.lang.String id,
java.lang.String[] opts)
Constructor sets the value of the article ID and extracts the hints and flags from the array of options passed on the command line. |
Method Summary | |
protected void |
dodecite_simple(java.lang.String line,
java.lang.String pr,
java.lang.String wr,
java.io.PrintWriter Output)
dodecite_simple handles the whole deciting process for a single citation (sub)entry. |
protected void |
dodecite(java.lang.String line,
java.lang.String pr,
java.lang.String wr,
java.io.PrintWriter Output)
dodecite handles the whole deciting process for a single citation entry. |
int |
doit(java.io.BufferedReader inp,
java.io.PrintWriter outp)
doit initialises the citation harvesting process by setting up the debugging stream, storing the document id, creating an entity encoder if necessary and calling the readLoop to process all the citations. |
protected void |
doReadLoop(java.io.BufferedReader inp,
java.io.PrintWriter Output)
doReadLoop performs a read loop, reading a line from the input, and processing and printing it to the output. |
void |
setAttributeMarker(int which,
AttributeMarker a)
setAttributeMarker allows the recogniser for a particular attribute to be changed. |
void |
setAttributeMarker(int which,
java.lang.String amName)
setAttributeMarker allows the recogniser for a particular attribute to be changed. |
void |
setAttributeMarker(java.lang.String which,
java.lang.String amName)
a version of setAttributeMarker which is useful for argv. |
void |
setCitationOutput(CitationOutput co)
setCitationOutput specifies the citation output object. |
void |
setCitationOutput(java.lang.String coName)
setCitationOutput specifies the citation output object. |
protected void |
split_multiCitation(java.lang.String rest,
java.lang.String pr,
java.lang.String wr,
java.io.PrintWriter Output)
split_multiCitation If significant citation material is found to be left over with a multiCite hint in operation, it may be assumed that another citation occurrence has been found and dodecite may be called recursively. |
Methods inherited from class java.lang.Object |
|
Field Detail |
static final int PREPROCESS
static final int NUMBERING
static final int DATE
static final int AUTHORS
static final int TITLE
static final int PAGERANGE
static final int PUBLICATION
static final int VOLUMEISSUE
static final int PUBLISH
static final int PLACE
static final int EXTRA
static final int POSTPROCESS
static final int N_AMS
Constructor Detail |
Deciter(java.lang.String id, java.lang.String[] opts)
Method Detail |
public void setAttributeMarker(int which, AttributeMarker a)
which
- one of the values PREPROCESS, NUMBERING, DATE, AUTHORS, TITLE,
PAGERANGE, VOLUMEISSUE, EXTRA, POSTPROCESSa
- an object which implements the AttributeMarker interfacepublic void setAttributeMarker(int which, java.lang.String amName)
which
- one of the values PREPROCESS, NUMBERING, DATE, AUTHORS, TITLE,
PAGERANGE, VOLUMEISSUE, EXTRA, POSTPROCESSamName
- a String which gives the name of a class which implements the
AttributeMarker interface. A new instance of this class will be created.public void setAttributeMarker(java.lang.String which, java.lang.String amName)
public void setCitationOutput(CitationOutput co)
co
- an object from the CitationOutput-derived class which will be used
for printing the citation data.public void setCitationOutput(java.lang.String coName)
coName
- the name of a CitationOutput-derived class which will be used
for printing the citation data.protected void dodecite(java.lang.String line, java.lang.String pr, java.lang.String wr, java.io.PrintWriter Output)
line
- the string containing the citation under scrutinypr
- the page number of the article which contained this citationwr
- the word number at which this citation started on the pageOutput
- the PrintWriter to which all output must be sentprotected void dodecite_simple(java.lang.String line, java.lang.String pr, java.lang.String wr, java.io.PrintWriter Output)
line
- the string containing the citation under scrutinypr
- the page number of the article which contained this citationwr
- the word number at which this citation started on the pageOutput
- the PrintWriter to which all output must be sentprotected void split_multiCitation(java.lang.String rest, java.lang.String pr, java.lang.String wr, java.io.PrintWriter Output)
rest
- the remaining part of the line containing the citation
under scrutinypr
- the page number of the article which contained this citationwr
- the word number at which this citation started on the pageOutput
- the PrintWriter to which all output must be sentprotected void doReadLoop(java.io.BufferedReader inp, java.io.PrintWriter Output) throws java.io.IOException
public int doit(java.io.BufferedReader inp, java.io.PrintWriter outp) throws java.io.IOException
inp
- the (de-entitied) input stream containing citations in a
primitive XML formatid
- the unique id corresponding to this articleoutp
- the (re-entitying) output stream to which the citation entries
will be written.
|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |