edu.cmu.minorthird.text
Class FilterTokenizer
java.lang.Object
edu.cmu.minorthird.text.CompoundTokenizer
edu.cmu.minorthird.text.FilterTokenizer
- All Implemented Interfaces:
- Tokenizer
public class FilterTokenizer
- extends CompoundTokenizer
This implementation of the Tokenizer interface is used for filtering a text base based on
a specified spantype. It is a trivial tokenizer in the sense that it takes a document from
the new text base, maps it to the old text base and copies over the tokens. If the mapping
is not found (ie if the document being added is not in the parent text base) then the
parent tokenizer is used.
- Author:
- Quinten Mercer
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
FilterTokenizer
public FilterTokenizer(TextBaseManager tbMan,
java.lang.String levelName,
java.lang.String parentLevelName)
splitIntoTokens
public java.lang.String[] splitIntoTokens(java.lang.String string)
- Tokenize a string
- Specified by:
splitIntoTokens
in interface Tokenizer
- Specified by:
splitIntoTokens
in class CompoundTokenizer
splitIntoTokens
public TextToken[] splitIntoTokens(Document document)
- Tokenize a document.
- Specified by:
splitIntoTokens
in interface Tokenizer
- Specified by:
splitIntoTokens
in class CompoundTokenizer