edu.cmu.minorthird.text
Class TextLabelsLoader

java.lang.Object
  extended by edu.cmu.minorthird.text.TextLabelsLoader

public class TextLabelsLoader
extends java.lang.Object

Loads and saves the contents of a TextLabels into a file. Labels can be loaded from operations (see importOps) or from a serialized TextLabels object. Labels can be serialized or types can be saved as operations, xml, or plain lists.

Author:
William Cohen

Field Summary
static int CLOSE_ALL_TYPES
          Spans in labels are a complete list of all spans.
static int CLOSE_BY_OPERATION
           
static int CLOSE_TYPES_IN_LABELED_DOCS
          If a document has been labeled for a type, assume all spans of that type are there.
static java.lang.String[] CLOSURE_NAMES
           
static int DONT_CLOSE_TYPES
          Make no assumptions about closure.
 
Constructor Summary
TextLabelsLoader()
           
 
Method Summary
 void closeLabels(MutableTextLabels labels, int policy)
          Close labels on the labels according to the policy.
 java.lang.String createXMLmarkup(java.lang.String documentId, TextLabels labels)
          Save extracted data in an XML format.
 void importOps(MutableTextLabels labels, TextBase base, java.io.File file)
          Load lines modifying a TextLabels from a file.
 MutableTextLabels loadOps(TextBase base, java.io.File file)
          Create a new labeling by importing from a file with importOps.
 MutableTextLabels loadSerialized(java.io.File file, TextBase base)
          Read in a serialized TextLabels.
 java.lang.String markupDocumentSpan(java.lang.String documentId, TextLabels labels)
          Deprecated. use createXMLMarkup(String documentId,TextLabels labels) Save extracted data in an XML format. Convert to string <root>..<type>...</type>..</root> nested things <a>A<b>B</b>C</a> are stored as nested things <a>A<set v=a,b>B</set>C</a> where single sets are simplified so mismatches like [A (B C] D)E are stored as <a>a<set v=a,b>B C</set></a><b>D</b>E
 java.lang.String printTypesAsOps(TextLabels labels)
          Save extracted data in a format readable with loadOps.
 void saveDocsWithEmbeddedTypes(TextLabels labels, java.io.File dir)
          Save documents to specified directory with extracted types embedded as xml.
 void saveSerialized(MutableTextLabels labels, java.io.File file)
          Serialize a TextLabels.
 void saveTypesAsOps(TextLabels labels, java.io.File file)
          Save extracted data in a format readable with loadOps.
 void saveTypesAsStrings(TextLabels labels, java.io.File file, boolean includeOffset)
          Save spans of given type into the file, one per line.
 java.lang.String saveTypesAsXML(TextLabels labels)
          Save extracted data in an XML format
 void setClosurePolicy(int policy)
          Set the closure policy.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CLOSE_ALL_TYPES

public static final int CLOSE_ALL_TYPES
Spans in labels are a complete list of all spans.

See Also:
Constant Field Values

CLOSE_TYPES_IN_LABELED_DOCS

public static final int CLOSE_TYPES_IN_LABELED_DOCS
If a document has been labeled for a type, assume all spans of that type are there.

See Also:
Constant Field Values

DONT_CLOSE_TYPES

public static final int DONT_CLOSE_TYPES
Make no assumptions about closure.

See Also:
Constant Field Values

CLOSE_BY_OPERATION

public static final int CLOSE_BY_OPERATION
See Also:
Constant Field Values

CLOSURE_NAMES

public static final java.lang.String[] CLOSURE_NAMES
Constructor Detail

TextLabelsLoader

public TextLabelsLoader()
Method Detail

setClosurePolicy

public void setClosurePolicy(int policy)
Set the closure policy.

Parameters:
policy - one of CLOSE_ALL_TYPES, CLOSE_TYPES_IN_LABELED_DOCS, DONT_CLOSE_TYPES

loadOps

public MutableTextLabels loadOps(TextBase base,
                                 java.io.File file)
                          throws java.io.IOException,
                                 java.io.FileNotFoundException
Create a new labeling by importing from a file with importOps.

Throws:
java.io.IOException
java.io.FileNotFoundException

importOps

public void importOps(MutableTextLabels labels,
                      TextBase base,
                      java.io.File file)
               throws java.io.IOException,
                      java.io.FileNotFoundException
Load lines modifying a TextLabels from a file. There are four allowed operations: addToType, closeType, closeAllTypes, setClosure For addToType: The lines must be of the form: addToType ID LOW LENGTH TYPE where ID is a documentID in the given TextBase, LOW is a character index into that document, and LENGTH is the length in characters of the span that will be created as given type TYPE. If LENGTH==-1, then the created span will go to the end of the document. For closeType: Lines must be closeType ID TYPE where ID is a documentID in the given TextBase and TYPE is the label type to close over that document. For closeAllTypes: Lines must be closeAllType ID where ID is a documentID in the given TextBase. The document will be closed for all types present in the TextLabels after all operations are performed. For setClosure: Lines must be setClosure POLICY where POLICY is one of the policy types defined in this class. It will immediately change the closure policy for the loader. This is best used at the beginning of the file to indicate one of the generic policies or the CLOSE_BY_OPERATION (default) policy.

Throws:
java.io.IOException
java.io.FileNotFoundException

closeLabels

public void closeLabels(MutableTextLabels labels,
                        int policy)
Close labels on the labels according to the policy. This applies the same policy to all documents and types in the labels. To get finer control of closure use closeLabels(Set, MutableTextLabels, Span) or MutableTextLabels.closeTypeInside(...)

Parameters:
labels -
policy -

loadSerialized

public MutableTextLabels loadSerialized(java.io.File file,
                                        TextBase base)
                                 throws java.io.IOException,
                                        java.io.FileNotFoundException
Read in a serialized TextLabels.

Throws:
java.io.IOException
java.io.FileNotFoundException

saveSerialized

public void saveSerialized(MutableTextLabels labels,
                           java.io.File file)
                    throws java.io.IOException
Serialize a TextLabels.

Throws:
java.io.IOException

printTypesAsOps

public java.lang.String printTypesAsOps(TextLabels labels)
Save extracted data in a format readable with loadOps.


saveTypesAsOps

public void saveTypesAsOps(TextLabels labels,
                           java.io.File file)
                    throws java.io.IOException
Save extracted data in a format readable with loadOps.

Throws:
java.io.IOException

saveTypesAsStrings

public void saveTypesAsStrings(TextLabels labels,
                               java.io.File file,
                               boolean includeOffset)
                        throws java.io.IOException
Save spans of given type into the file, one per line. Linefeeds in strings are replaced with spaces.

Throws:
java.io.IOException

saveDocsWithEmbeddedTypes

public void saveDocsWithEmbeddedTypes(TextLabels labels,
                                      java.io.File dir)
                               throws java.io.IOException
Save documents to specified directory with extracted types embedded as xml.

Throws:
java.io.IOException

markupDocumentSpan

public java.lang.String markupDocumentSpan(java.lang.String documentId,
                                           TextLabels labels)
Deprecated. use createXMLMarkup(String documentId,TextLabels labels) Save extracted data in an XML format. Convert to string <root>..<type>...</type>..</root> nested things <a>A<b>B</b>C</a> are stored as nested things <a>A<set v=a,b>B</set>C</a> where single sets are simplified so mismatches like [A (B C] D)E are stored as <a>a<set v=a,b>B C</set></a><b>D</b>E


createXMLmarkup

public java.lang.String createXMLmarkup(java.lang.String documentId,
                                        TextLabels labels)
Save extracted data in an XML format. Convert to string <root>..<type>...</type>..</root>.

In the even that labels overlap such as [A (B C] D)E an IllegalArgumentException is thrown because a well-formed XML document cannot be created.


saveTypesAsXML

public java.lang.String saveTypesAsXML(TextLabels labels)
Save extracted data in an XML format