edu.cmu.minorthird.text
Class SpanTypeTokenizer
java.lang.Object
edu.cmu.minorthird.text.CompoundTokenizer
edu.cmu.minorthird.text.SpanTypeTokenizer
- All Implemented Interfaces:
- Tokenizer
public class SpanTypeTokenizer
- extends CompoundTokenizer
This implementation of the Tokenizer interface is used for re-tokenizing documents based on
a specified spantype. All tokens inside the spantype are put together to create a single
"pseudotoken". All other tokens remain as originally tokenized.
- Author:
- Quinten Mercer
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SpanTypeTokenizer
public SpanTypeTokenizer(java.lang.String s,
TextLabels l)
getSpanType
public java.lang.String getSpanType()
getTextLabels
public TextLabels getTextLabels()
splitIntoTokens
public java.lang.String[] splitIntoTokens(java.lang.String string)
- Tokenize a string
- Specified by:
splitIntoTokens
in interface Tokenizer
- Specified by:
splitIntoTokens
in class CompoundTokenizer
splitIntoTokens
public TextToken[] splitIntoTokens(Document document)
- Tokenize a document.
- Specified by:
splitIntoTokens
in interface Tokenizer
- Specified by:
splitIntoTokens
in class CompoundTokenizer