edu.cmu.minorthird.text
Class SpanTypeTokenizer

java.lang.Object
  extended by edu.cmu.minorthird.text.CompoundTokenizer
      extended by edu.cmu.minorthird.text.SpanTypeTokenizer
All Implemented Interfaces:
Tokenizer

public class SpanTypeTokenizer
extends CompoundTokenizer

This implementation of the Tokenizer interface is used for re-tokenizing documents based on a specified spantype. All tokens inside the spantype are put together to create a single "pseudotoken". All other tokens remain as originally tokenized.

Author:
Quinten Mercer

Field Summary
 
Fields inherited from class edu.cmu.minorthird.text.CompoundTokenizer
parentTokenizer
 
Constructor Summary
SpanTypeTokenizer(java.lang.String s, TextLabels l)
           
 
Method Summary
 java.lang.String getSpanType()
           
 TextLabels getTextLabels()
           
 TextToken[] splitIntoTokens(Document document)
          Tokenize a document.
 java.lang.String[] splitIntoTokens(java.lang.String string)
          Tokenize a string
 
Methods inherited from class edu.cmu.minorthird.text.CompoundTokenizer
getParent
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SpanTypeTokenizer

public SpanTypeTokenizer(java.lang.String s,
                         TextLabels l)
Method Detail

getSpanType

public java.lang.String getSpanType()

getTextLabels

public TextLabels getTextLabels()

splitIntoTokens

public java.lang.String[] splitIntoTokens(java.lang.String string)
Tokenize a string

Specified by:
splitIntoTokens in interface Tokenizer
Specified by:
splitIntoTokens in class CompoundTokenizer

splitIntoTokens

public TextToken[] splitIntoTokens(Document document)
Tokenize a document.

Specified by:
splitIntoTokens in interface Tokenizer
Specified by:
splitIntoTokens in class CompoundTokenizer