uk.ac.soton.harvester
Class EntityReader

java.lang.Object
  |
  +--java.io.Reader
        |
        +--java.io.BufferedReader
              |
              +--uk.ac.soton.harvester.EntityReader

public class EntityReader
extends java.io.BufferedReader

EntityReader extends the behaviour of BufferedReader so that any ISO-Latin-1 entities are replaced by their ASCII/Unicode characters. This class accompanies EntityWriter to allow the processor to read data in from and write data out to XML-based files.


Field Summary
(package private)  java.util.Dictionary d
          d provides a lookup from entity name to character number
 
Fields inherited from class java.io.Reader
lock
 
Constructor Summary
(package private) EntityReader(java.io.Reader in)
          The main constructor allows an EntityReader to be based on any kind of Reader.
 
Method Summary
(package private)  java.lang.String entLookup(java.lang.String name)
          entLookup is a wrapper function which guarantees a char for an entity name.
(package private)  java.lang.String entString(java.lang.String s)
          entString decodes any unusual characters in a string from ISOLAtin-1 entities.
 java.lang.String readLine()
           
 
Methods inherited from class java.io.BufferedReader
, close, mark, markSupported, read, read, readLine, ready, reset, skip
 
Methods inherited from class java.io.Reader
read
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

d

java.util.Dictionary d
d provides a lookup from entity name to character number
Constructor Detail

EntityReader

EntityReader(java.io.Reader in)
The main constructor allows an EntityReader to be based on any kind of Reader.
Method Detail

entLookup

java.lang.String entLookup(java.lang.String name)
entLookup is a wrapper function which guarantees a char for an entity name. It defaults to "_" for unrecognised entities.
Parameters:
name - entity name to be looked up. name may in fact be a number of the form #n according to the rules of XML.
Returns:
String value of length 1, whose first character is the character represented by the entity name given as a parameter (or "_" in pathalogical cases).

entString

java.lang.String entString(java.lang.String s)
entString decodes any unusual characters in a string from ISOLAtin-1 entities. Ordinary ASCII characters are left untouched. Some "ordinary" characters ('&','<','>') have also been usurped to conform to the XML standard. e.g. "Carr &mp; Ren&eacute;" is transformed into "Carr & René".
Parameters:
s - the string to process
Returns:
the string with embedded entity names replaced.

readLine

public java.lang.String readLine()
                          throws java.io.IOException
Overrides:
readLine in class java.io.BufferedReader