edu.cmu.minorthird.text
Class TextBaseManager

java.lang.Object
  extended by edu.cmu.minorthird.text.TextBaseManager

public class TextBaseManager
extends java.lang.Object

Manages the mappings between TextBases. This class maintains a mapping of names to instances of TextBase. All of the TextBases in the mapping are derived from the "root" level TextBase that was added first. Currently there are two ways to derive a new TextBase from an existing one: filter and retokenize.

Author:
Quinten Mercer

Constructor Summary
TextBaseManager(java.lang.String rootBaseName, TextBase rootBase)
          Creates a new TextBaseManager using the specified textbase as the root textbase and the specified name is used in place of "root" to identify it.
TextBaseManager(TextBase rootBase)
          Creates a new TextBaseManager using the specified textbase as the root textbase and "root" as the name to identify it.
 
Method Summary
 boolean containsLevel(java.lang.String levelName)
          Returns a boolean indicating whether or not this manager has a level with the specified name
 TextBase filter(java.lang.String parentLevelName, TextLabels parentLabels, java.lang.String newLevelName, java.lang.String spanType)
          Creates a new TextBase named newLevelName from an existing TextBase named parentLevelName.
 Span getMatchingSpan(Span span, java.lang.String srcName, java.lang.String dstName)
          Finds a mapping path from the source text base to the destination textbase and translates the specified span through each successive mapping until the coresponding span in the destination text base is located.
 Span getMatchingSpan(java.lang.String srcName, java.lang.String srcDocId, int srcOffset, int length, java.lang.String dstName)
          Sometimes you may not have a source span, but rather only have a char offset in the source doc.
 TextBase getTextBase(java.lang.String name)
          Returns the textbase identified by name.
 MutableTextBase retokenize(Tokenizer newTokenizer, java.lang.String parentLevelName, java.lang.String newLevelName)
          Creates a new TextBase named newLevelName from an existing TextBase named parentLevelName.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TextBaseManager

public TextBaseManager(TextBase rootBase)
Creates a new TextBaseManager using the specified textbase as the root textbase and "root" as the name to identify it.


TextBaseManager

public TextBaseManager(java.lang.String rootBaseName,
                       TextBase rootBase)
Creates a new TextBaseManager using the specified textbase as the root textbase and the specified name is used in place of "root" to identify it.

Method Detail

containsLevel

public boolean containsLevel(java.lang.String levelName)
Returns a boolean indicating whether or not this manager has a level with the specified name


getTextBase

public TextBase getTextBase(java.lang.String name)
Returns the textbase identified by name.


getMatchingSpan

public Span getMatchingSpan(java.lang.String srcName,
                            java.lang.String srcDocId,
                            int srcOffset,
                            int length,
                            java.lang.String dstName)
Sometimes you may not have a source span, but rather only have a char offset in the source doc. There are two scenarios where this could happen. First, it may be the case that you really just want to map some char offset of an existing document. In this case this method will simply get the documentSpan for the doc, use Span.charIndexSubSpan to create a span to map, and then forward the call to the getMatchingSpan method that takes a source Span instance. The other situation is where you may need to map sequences of chars before the document is actually in a TextBase. For instance, FilterTokenizer needs to map char sequences in order to tokenize a document. This works because you can create maps between documents in two text bases even if the destination document doesn't yet exist in the TextBase. To make it happed, this method first maps the char offset to a span in it's parent, then calls getMatchingSpan to propagate the mapping down to the destination textbase.


getMatchingSpan

public Span getMatchingSpan(Span span,
                            java.lang.String srcName,
                            java.lang.String dstName)
Finds a mapping path from the source text base to the destination textbase and translates the specified span through each successive mapping until the coresponding span in the destination text base is located.


retokenize

public MutableTextBase retokenize(Tokenizer newTokenizer,
                                  java.lang.String parentLevelName,
                                  java.lang.String newLevelName)
Creates a new TextBase named newLevelName from an existing TextBase named parentLevelName. This new TextBase has the exact same document set as the parent, but all the docs will be retokenized using the specified Tokenizer.


filter

public TextBase filter(java.lang.String parentLevelName,
                       TextLabels parentLabels,
                       java.lang.String newLevelName,
                       java.lang.String spanType)
Creates a new TextBase named newLevelName from an existing TextBase named parentLevelName. This new TextBase will contain a document for each instance of the provided spanType in the parent TextBase (specified by parentLabels). For example if a document in the parent TextBase has 3 instances of the specified spanType, then the new TextBase will have 3 separate documents. All text that is not part of the specified spanType is filtered out and does not appear in the new TextBase anywhere.