edu.cmu.minorthird.text
Class BasicTextBase

java.lang.Object
  extended by edu.cmu.minorthird.text.AbstractTextBase
      extended by edu.cmu.minorthird.text.MutableTextBase
          extended by edu.cmu.minorthird.text.BasicTextBase
All Implemented Interfaces:
TextBase, java.io.Serializable

public class BasicTextBase
extends MutableTextBase
implements java.io.Serializable

Maintains information about what's in a set of documents. Specifically, this contains a set of character sequences (TextToken's) from some sort of set of containing documents - typically found by tokenization.

Author:
William Cohen, Cameron Williams, Quinten Mercer
See Also:
Serialized Form

Field Summary
 
Fields inherited from class edu.cmu.minorthird.text.AbstractTextBase
tokenizer
 
Constructor Summary
BasicTextBase()
          Default constructor creates a new TextBase with the default Tokenizer.
BasicTextBase(Tokenizer t)
          Constructor that specifies a custom Tokenizer to be used with this TextBase.
 
Method Summary
 Span documentSpan(java.lang.String documentId)
          Returns a Span instance that encloses all of the tokens in the document specified by documentId.
 java.util.Iterator<Span> documentSpanIterator()
          Returns a Span.Looper instance that includes a document span for every document in this TextBase.
 Document getDocument(java.lang.String documentId)
          Returns the Document instance that corresponds to the specified documentId or null if no document exists with the specified documentId.
 void loadDocument(java.lang.String documentId, java.lang.String documentString)
          Adds a document to this TextBase with documentId as its identifier and with text specified by documentString.
 void loadDocument(java.lang.String documentId, java.lang.String documentString, int charOffset)
          Adds a document to this TextBase with documentId as its identifier and with text specified by documentString.
static void main(java.lang.String[] args)
           
 void setDocumentGroupId(java.lang.String documentId, java.lang.String documentGroupId)
          Sets the document group id for the specified documentId to the specified document group id.
 int size()
          Returns the number of documents currently in this TextBase.
 
Methods inherited from class edu.cmu.minorthird.text.AbstractTextBase
getTokenizer
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BasicTextBase

public BasicTextBase()
Default constructor creates a new TextBase with the default Tokenizer.


BasicTextBase

public BasicTextBase(Tokenizer t)
Constructor that specifies a custom Tokenizer to be used with this TextBase.

Method Detail

loadDocument

public void loadDocument(java.lang.String documentId,
                         java.lang.String documentString)
Adds a document to this TextBase with documentId as its identifier and with text specified by documentString.

Specified by:
loadDocument in class MutableTextBase

loadDocument

public void loadDocument(java.lang.String documentId,
                         java.lang.String documentString,
                         int charOffset)
Adds a document to this TextBase with documentId as its identifier and with text specified by documentString. Also, this method sets the offset parameter in the new Document to the specified charOffset.

Specified by:
loadDocument in class MutableTextBase

setDocumentGroupId

public void setDocumentGroupId(java.lang.String documentId,
                               java.lang.String documentGroupId)
Sets the document group id for the specified documentId to the specified document group id.

Specified by:
setDocumentGroupId in class MutableTextBase

size

public int size()
Returns the number of documents currently in this TextBase.

Specified by:
size in interface TextBase
Specified by:
size in class MutableTextBase

getDocument

public Document getDocument(java.lang.String documentId)
Returns the Document instance that corresponds to the specified documentId or null if no document exists with the specified documentId.

Specified by:
getDocument in interface TextBase
Specified by:
getDocument in class MutableTextBase

documentSpan

public Span documentSpan(java.lang.String documentId)
Returns a Span instance that encloses all of the tokens in the document specified by documentId. Note that this Span instance will NOT include any white space that comes before the first token or after the last token.

Specified by:
documentSpan in interface TextBase
Specified by:
documentSpan in class MutableTextBase

documentSpanIterator

public java.util.Iterator<Span> documentSpanIterator()
Returns a Span.Looper instance that includes a document span for every document in this TextBase.

Specified by:
documentSpanIterator in interface TextBase
Specified by:
documentSpanIterator in class MutableTextBase

main

public static void main(java.lang.String[] args)