Linkable.Analysis
Class HTMLAnalyzer

java.lang.Object
  |
  +--Linkable.Analysis.HTMLAnalyzer
All Implemented Interfaces:
RefLinkAnalyzer

public class HTMLAnalyzer
extends java.lang.Object
implements RefLinkAnalyzer


Field Summary
private static boolean DEBUG
           
private  java.io.BufferedReader in
           
private static java.lang.String ME
           
private  java.lang.String pubDate
           
(package private)  org.w3c.tidy.Tidy tidy
           
(package private)  java.io.BufferedInputStream tidyIn
           
(package private)  java.io.FileOutputStream tidyOut
           
(package private)  XHTMLAnalyzer xa
           
 
Constructor Summary
HTMLAnalyzer(java.lang.String url)
          Constructor
HTMLAnalyzer(java.lang.String localURL, java.lang.String url)
           
 
Method Summary
 java.util.Vector buildCitationList(java.lang.String docURN)
           
 java.lang.String buildLocalMetaData(java.lang.String DOI, java.lang.String pubDate, Creation c)
           
 Reference[] buildRefList(BibData b)
           
 java.lang.String getDate()
           
 java.lang.String getLinkedText(Reference[] refList, java.lang.String url)
          getLinkedText emits XML for the linked body of the text and/or the characters of the text body followed by reference-link data suitable for separate presentation.
 java.lang.String getLinkedTextFinalize()
          getLinkedTextFinalize emits XML for finishing off the Surrogate linked text output.
 java.lang.String getLinkedTextInitialize()
          getLinkedTextInitialize sets up to generate XML for our Surrogate, but not the incantation.
private  boolean runTidy(java.lang.String url)
           
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
 

Field Detail

ME

private static final java.lang.String ME

DEBUG

private static final boolean DEBUG

in

private java.io.BufferedReader in

tidy

org.w3c.tidy.Tidy tidy

tidyIn

java.io.BufferedInputStream tidyIn

tidyOut

java.io.FileOutputStream tidyOut

xa

XHTMLAnalyzer xa

pubDate

private java.lang.String pubDate
Constructor Detail

HTMLAnalyzer

public HTMLAnalyzer(java.lang.String url)
             throws SurrogateException
Constructor
Parameters:
is - name of file that contains the HTML to be converted
Throws:
SurrogateException - if the url cannot be opened for analysis

HTMLAnalyzer

public HTMLAnalyzer(java.lang.String localURL,
                    java.lang.String url)
             throws SurrogateException
Method Detail

getDate

public java.lang.String getDate()
Specified by:
getDate in interface RefLinkAnalyzer

buildLocalMetaData

public java.lang.String buildLocalMetaData(java.lang.String DOI,
                                           java.lang.String pubDate,
                                           Creation c)
Specified by:
buildLocalMetaData in interface RefLinkAnalyzer

buildRefList

public Reference[] buildRefList(BibData b)
Specified by:
buildRefList in interface RefLinkAnalyzer

buildCitationList

public java.util.Vector buildCitationList(java.lang.String docURN)
Specified by:
buildCitationList in interface RefLinkAnalyzer

getLinkedText

public java.lang.String getLinkedText(Reference[] refList,
                                      java.lang.String url)
                               throws SurrogateException
getLinkedText emits XML for the linked body of the text and/or the characters of the text body followed by reference-link data suitable for separate presentation. Note that the reference-link data can be constructed by this routine but saved for output by the getLinkedTextFinalize routine.
Specified by:
getLinkedText in interface RefLinkAnalyzer
Parameters:
The - array of Reference objects belonging to this Surrogate.
The - net URL of the document, for a base URL
Throws:
SurrogateException - if URL to be analyzed cannot be opened.

getLinkedTextInitialize

public java.lang.String getLinkedTextInitialize()
getLinkedTextInitialize sets up to generate XML for our Surrogate, but not the incantation.
Specified by:
getLinkedTextInitialize in interface RefLinkAnalyzer

getLinkedTextFinalize

public java.lang.String getLinkedTextFinalize()
getLinkedTextFinalize emits XML for finishing off the Surrogate linked text output. The main use for this routine is to emit the linkage data elements for documents that are not expressed in HTML or in XHTML.
Specified by:
getLinkedTextFinalize in interface RefLinkAnalyzer

runTidy

private boolean runTidy(java.lang.String url)