|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--uk.ac.soton.harvester.Utils
Utils is a place for miscellaneous utility methods to try to control class bloat!
Field Summary | |
static EntityEncoder |
ee
ee is an entity encoder object which contains the mapping from (non-)ASCII to ISO-Latin1 entity names. |
Constructor Summary | |
Utils()
|
Method Summary | |
static void |
DEBUG(java.lang.String s)
DEBUG is a convenience method for producing debugging output. |
static java.lang.String |
detag(java.lang.String s)
detag removes tags from an HTML-style string. |
static boolean |
iciSWe(java.lang.String s1,
java.lang.String s2)
iciSWe "ignore case of initial" version of startsWith used to make "Del " and "del " match. |
static boolean |
iciSWp(java.lang.String s1,
java.lang.String s2)
iciSWp is the same as iciSWe except it looks for punctuation instead of a space. |
static boolean |
isBook(DeciterState ds)
isBook is a utility method that encapsulates a naive heuristic (oh, alright then, hack) for determining whether the citation was to a book/thesis or not. |
static boolean |
isDash(char ch)
isDash recognises the characters from all the character sets which could correspond to a "dash". |
static boolean |
isInitial(java.lang.String s)
isInitial checks to see whether the current word is in fact an inital / a set of initials as opposed to a surname. |
static boolean |
isProceedings(DeciterState ds)
isProceedings is a utility method that encapsulates a naive heuristic (oh, alright then, hack) for determining whether the citation was to a conf/workshop proceedings |
static boolean |
lowerCaseNameComponent(java.lang.String s)
lowerCaseNameComponent recognises those words which start with a lowercase letter which are in fact parts of names. |
static boolean |
lowercaseOrHyphen(java.lang.String s,
int i)
lowercaseOrHyphen is a utility method that recognises valid characters (ie [a-z-]) within an XXX eprint article identifier. |
static java.lang.String |
PCDATA(java.lang.String s)
PCDATA is a convenience method to access the entity encoder. |
static void |
setDebugging(boolean b)
setDebugging controls whether DEBUG messages are printed or not. |
static java.lang.String |
substring(java.lang.String line,
int a,
int b)
This is just a safe version of substring |
static java.lang.String |
toInitials(java.lang.String s)
toInitials turns a set of "forenames" to an appropriate set of separated, correctly delimited initials. |
static boolean |
xxxId(java.lang.String s)
xxxId recognises strings which are XXX citation ids. |
Methods inherited from class java.lang.Object |
|
Field Detail |
public static EntityEncoder ee
Constructor Detail |
public Utils()
Method Detail |
public static void setDebugging(boolean b)
public static void DEBUG(java.lang.String s)
s
- the String to be written to the debugging file (a newline is added).public static java.lang.String PCDATA(java.lang.String s)
public static boolean iciSWe(java.lang.String s1, java.lang.String s2)
public static boolean iciSWp(java.lang.String s1, java.lang.String s2)
public static boolean xxxId(java.lang.String s)
public static boolean lowerCaseNameComponent(java.lang.String s)
public static boolean isDash(char ch)
public static boolean isInitial(java.lang.String s)
public static java.lang.String toInitials(java.lang.String s)
public static java.lang.String detag(java.lang.String s)
public static boolean lowercaseOrHyphen(java.lang.String s, int i)
s
- the string containing the character to checkthe
- character offset within the string to checkpublic static boolean isProceedings(DeciterState ds)
public static boolean isBook(DeciterState ds)
public static java.lang.String substring(java.lang.String line, int a, int b)
|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |