Class BasicTextBase

  extended by edu.cmu.minorthird.text.AbstractTextBase
      extended by edu.cmu.minorthird.text.MutableTextBase
          extended by edu.cmu.minorthird.text.BasicTextBase
All Implemented Interfaces:

public class BasicTextBase
extends MutableTextBase

Maintains information about what's in a set of documents. Specifically, this contains a set of character sequences (TextToken's) from some sort of set of containing documents - typically found by tokenization.

William Cohen, Cameron Williams, Quinten Mercer
Field Summary
Constructor Summary
          Default constructor creates a new TextBase with the default Tokenizer.
BasicTextBase(Tokenizer t)
          Constructor that specifies a custom Tokenizer to be used with this TextBase.
Method Summary
 Span documentSpan(java.lang.String documentId)
          Returns a Span instance that encloses all of the tokens in the document specified by documentId.
 java.util.Iterator<Span> documentSpanIterator()
          Returns a Span.Looper instance that includes a document span for every document in this TextBase.
 Document getDocument(java.lang.String documentId)
          Returns the Document instance that corresponds to the specified documentId or null if no document exists with the specified documentId.
 void loadDocument(java.lang.String documentId, java.lang.String documentString)
          Adds a document to this TextBase with documentId as its identifier and with text specified by documentString.
 void loadDocument(java.lang.String documentId, java.lang.String documentString, int charOffset)
          Adds a document to this TextBase with documentId as its identifier and with text specified by documentString.
static void main(java.lang.String[] args)
 void setDocumentGroupId(java.lang.String documentId, java.lang.String documentGroupId)
          Sets the document group id for the specified documentId to the specified document group id.
 int size()
          Returns the number of documents currently in this TextBase.
Methods inherited from class edu.cmu.minorthird.text.AbstractTextBase
Methods inherited from class java.lang.Object
Constructor Detail


public BasicTextBase()
Default constructor creates a new TextBase with the default Tokenizer.


public BasicTextBase(Tokenizer t)
Constructor that specifies a custom Tokenizer to be used with this TextBase.

Method Detail


public void loadDocument(java.lang.String documentId,
                         java.lang.String documentString)
Adds a document to this TextBase with documentId as its identifier and with text specified by documentString.

public void loadDocument(java.lang.String documentId,
                         java.lang.String documentString,
                         int charOffset)
Adds a document to this TextBase with documentId as its identifier and with text specified by documentString. Also, this method sets the offset parameter in the new Document to the specified charOffset.

public void setDocumentGroupId(java.lang.String documentId,
                               java.lang.String documentGroupId)
Sets the document group id for the specified documentId to the specified document group id.

public int size()
Returns the number of documents currently in this TextBase.

public Document getDocument(java.lang.String documentId)
Returns the Document instance that corresponds to the specified documentId or null if no document exists with the specified documentId.

public Span documentSpan(java.lang.String documentId)
Returns a Span instance that encloses all of the tokens in the document specified by documentId. Note that this Span instance will NOT include any white space that comes before the first token or after the last token.

public java.util.Iterator<Span> documentSpanIterator()
Returns a Span.Looper instance that includes a document span for every document in this TextBase.

public static void main(java.lang.String[] args)