edu.cmu.minorthird.text
Interface TextBase

All Known Implementing Classes:
AbstractTextBase, BasicTextBase, MutableTextBase, SubTextBase

public interface TextBase

Maintains information about what's in a set of documents. Specifically, this contains a set of character sequences (TextToken's) from some sort of set of containing documents - typically found by tokenization.

Author:
William Cohen, Quinten Mercer

Method Summary
 Span documentSpan(java.lang.String documentId)
          Looks up the document Span for the given documentId.
 java.util.Iterator<Span> documentSpanIterator()
          Returns an iterator over the documents in this TextBase.
 Document getDocument(java.lang.String docID)
          Returns the Document with the given ID
 Tokenizer getTokenizer()
          Returns the Tokenizer used on the documents in this text base.
 int size()
          Returns the number of documents contained in this TextBase.
 

Method Detail

getTokenizer

Tokenizer getTokenizer()
Returns the Tokenizer used on the documents in this text base.


size

int size()
Returns the number of documents contained in this TextBase.


getDocument

Document getDocument(java.lang.String docID)
Returns the Document with the given ID


documentSpanIterator

java.util.Iterator<Span> documentSpanIterator()
Returns an iterator over the documents in this TextBase.


documentSpan

Span documentSpan(java.lang.String documentId)
Looks up the document Span for the given documentId. Returns the Span or null if a document with documentId was not found in this TextBase.