edu.cmu.minorthird.text
Class FilterTokenizer

java.lang.Object
  extended by edu.cmu.minorthird.text.CompoundTokenizer
      extended by edu.cmu.minorthird.text.FilterTokenizer
All Implemented Interfaces:
Tokenizer

public class FilterTokenizer
extends CompoundTokenizer

This implementation of the Tokenizer interface is used for filtering a text base based on a specified spantype. It is a trivial tokenizer in the sense that it takes a document from the new text base, maps it to the old text base and copies over the tokens. If the mapping is not found (ie if the document being added is not in the parent text base) then the parent tokenizer is used.

Author:
Quinten Mercer

Field Summary
 
Fields inherited from class edu.cmu.minorthird.text.CompoundTokenizer
parentTokenizer
 
Constructor Summary
FilterTokenizer(TextBaseManager tbMan, java.lang.String levelName, java.lang.String parentLevelName)
           
 
Method Summary
 TextToken[] splitIntoTokens(Document document)
          Tokenize a document.
 java.lang.String[] splitIntoTokens(java.lang.String string)
          Tokenize a string
 
Methods inherited from class edu.cmu.minorthird.text.CompoundTokenizer
getParent
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FilterTokenizer

public FilterTokenizer(TextBaseManager tbMan,
                       java.lang.String levelName,
                       java.lang.String parentLevelName)
Method Detail

splitIntoTokens

public java.lang.String[] splitIntoTokens(java.lang.String string)
Tokenize a string

Specified by:
splitIntoTokens in interface Tokenizer
Specified by:
splitIntoTokens in class CompoundTokenizer

splitIntoTokens

public TextToken[] splitIntoTokens(Document document)
Tokenize a document.

Specified by:
splitIntoTokens in interface Tokenizer
Specified by:
splitIntoTokens in class CompoundTokenizer