uk.ac.soton.harvester
Class harness

java.lang.Object
  |
  +--uk.ac.soton.harvester.harness

public class harness
extends java.lang.Object

harness is a driver class that creates a citation harvesting object and applies it to a specific data file.


Constructor Summary
harness()
           
 
Method Summary
(package private) static int getCitations(java.lang.String pdfFile, java.lang.String articleId, java.lang.String xmlFile, java.lang.String[] options)
          getCitations performs citation harvesting on a specific data file.
static void main(java.lang.String[] args)
          main is the driver for the citation processing.
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

harness

public harness()
Method Detail

getCitations

static int getCitations(java.lang.String pdfFile,
                        java.lang.String articleId,
                        java.lang.String xmlFile,
                        java.lang.String[] options)
                 throws java.lang.Exception
getCitations performs citation harvesting on a specific data file.
Parameters:
pdfFile - the data file to be interpreted. This is not in fact a file in PDF formal, but an intermediate XML format directly produced from the PDF by an independent program (currently bpe5).
articleId - a string which uniquely identifies the articles. The string will conform to the pattern PP-JJ-STUFF where PP is a two-letter publisher code, JJ is a two-letter journal code and STUFF is an uninterpreted article code. The substring PP-JJ uniqueli identifies a particular journal.
xmlFile - the name of a file to which XML data will be written containing the citation data extracted from the article. The XML conforms to the Ingenta DTD.
options - an array of strings (just the arguments passed to the main method) which contains a list of hints on how to best parse the article.

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
main is the driver for the citation processing. It takes input and output file names from the command line along with (optional) article id and a set of hints for processing.