|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object edu.cmu.minorthird.text.TextBaseLoader
public class TextBaseLoader
Configurable Text Loader.
Usage: Configure a loader object using the constructors. Call .load(File) with the file object to your data (could be a directory) load(File) returns the TextBase object for the data.
Default: TextBaseLoader tbl = new TextBaseLoader(); Loads One Document per File and uses embedded labels ------------------------------------------------------ Specify Document Style TextBaseLoader tbl = new TextBaseLoader(TextBaseLoader.DOC_PER_LINE); // Loads One document per line TextBaseLoader tbl = new TextBaseLoader(TextBaseLoader.DOC_PER_FILE); // Loads One document per file ------------------------------------------------------ Specify document type and whether to use embedded Labels // ex: Loads one doc per line and ignores embedded labels TextBaseLoader tbl = new TextBaseLoader(TextBaseLoader.DOC_PER_LINE, false); ------------------------------------------------------ Specify document type and whether to use embedded Labels // ex: Loads one doc per file, uses embedded labels, and recurses directories TextBaseLoader tbl = new TextBaseLoader(TextBaseLoader.DOC_PER_FILE, true, true);In ALL cases use: tbl.load(FILE);
Field Summary | |
---|---|
static int |
DIRECTORY_NAME
|
static int |
DOC_PER_FILE
|
static int |
DOC_PER_LINE
|
static int |
FILE_NAME
|
static boolean |
IGNORE_XML
|
static int |
IN_FILE
|
static int |
NONE
|
static boolean |
USE_XML
|
Constructor Summary | |
---|---|
TextBaseLoader()
Default constructor. |
|
TextBaseLoader(int documentStyle)
Specifies the document style to use, but leaves all other properties to their defaults. |
|
TextBaseLoader(int documentStyle,
boolean use_markup)
|
|
TextBaseLoader(int documentStyle,
boolean use_markup,
boolean recurseDirectories)
|
|
TextBaseLoader(int documentStyle,
int docID)
Deprecated. |
|
TextBaseLoader(int documentStyle,
int docID,
boolean use_markup)
Deprecated. |
|
TextBaseLoader(int documentStyle,
int docID,
int groupID,
int categoryID)
Deprecated. |
|
TextBaseLoader(int documentStyle,
int docID,
int groupID,
int categoryID,
boolean labelsInFile,
boolean recurseDirectories)
Deprecated. |
Method Summary | |
---|---|
MutableTextLabels |
getLabels()
get labeling generated by tags in data file |
protected java.lang.String |
labelLine(java.lang.String line,
java.lang.StringBuffer docBuffer,
java.lang.String docId,
java.util.List<edu.cmu.minorthird.text.TextBaseLoader.CharSpan> spanList)
Takes a single line of text. |
MutableTextBase |
load(java.io.File dataLocation)
Load data from the given location according to configuration and whether location is a directory or not Calling load a second time will load into the same text base (thus the second call returns documents from both the first and second locations). |
MutableTextBase |
load(java.io.File dataLocation,
Tokenizer tok)
Load data from the given location according to configuration and whether location is a directory or not Calling load a second time will load into the same text base (thus the second call returns documents from both the first and second locations). |
static MutableTextLabels |
loadDirOfTaggedFiles(java.io.File dir)
Deprecated. |
static TextBase |
loadDocPerLine(java.io.File file,
boolean hasGroupID)
Deprecated. |
void |
loadTaggedFiles(TextBase base,
java.io.File dir)
Deprecated. |
MutableTextBase |
loadWordPerLineFile(java.io.File file)
Load a document where each word has it's own line and is follwed by three desscriptor words. |
void |
setDocumentStyle(int style)
Sets the document style for loaded documents. |
void |
setLabelsInFile(boolean b)
Sets whether the loader should use or ignore XML markup in the files. |
void |
setRecurseDirectories(boolean rec)
Sets whether the loader should recurse directories when loading docs. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int NONE
public static final int DIRECTORY_NAME
public static final int FILE_NAME
public static final int IN_FILE
public static final int DOC_PER_LINE
public static final int DOC_PER_FILE
public static final boolean USE_XML
public static final boolean IGNORE_XML
Constructor Detail |
---|
public TextBaseLoader()
public TextBaseLoader(int documentStyle)
public TextBaseLoader(int documentStyle, boolean use_markup)
public TextBaseLoader(int documentStyle, boolean use_markup, boolean recurseDirectories)
public TextBaseLoader(int documentStyle, int docID)
public TextBaseLoader(int documentStyle, int docID, boolean use_markup)
public TextBaseLoader(int documentStyle, int docID, int groupID, int categoryID)
public TextBaseLoader(int documentStyle, int docID, int groupID, int categoryID, boolean labelsInFile, boolean recurseDirectories)
Method Detail |
---|
public MutableTextBase load(java.io.File dataLocation) throws java.io.IOException, java.text.ParseException
dataLocation
- File representation of location (single file or directory)
java.io.IOException
- -
problem reading the file
java.text.ParseException
- -
problem with xml of internal taggingpublic MutableTextBase load(java.io.File dataLocation, Tokenizer tok) throws java.io.IOException, java.text.ParseException
dataLocation
- File representation of location (single file or directory)
java.io.IOException
- -
problem reading the file
java.text.ParseException
- -
problem with xml of internal taggingpublic MutableTextBase loadWordPerLineFile(java.io.File file) throws java.io.IOException, java.io.FileNotFoundException
java.io.IOException
java.io.FileNotFoundException
public void setLabelsInFile(boolean b)
public void setDocumentStyle(int style)
public void setRecurseDirectories(boolean rec)
public MutableTextLabels getLabels()
public static MutableTextLabels loadDirOfTaggedFiles(java.io.File dir) throws java.text.ParseException, java.io.IOException
java.text.ParseException
java.io.IOException
public void loadTaggedFiles(TextBase base, java.io.File dir) throws java.io.IOException, java.io.FileNotFoundException
java.io.IOException
java.io.FileNotFoundException
public static TextBase loadDocPerLine(java.io.File file, boolean hasGroupID) throws java.text.ParseException, java.io.IOException
java.text.ParseException
java.io.IOException
protected java.lang.String labelLine(java.lang.String line, java.lang.StringBuffer docBuffer, java.lang.String docId, java.util.List<edu.cmu.minorthird.text.TextBaseLoader.CharSpan> spanList) throws java.text.ParseException
line
- -
String of a single line to have it's labels parsedspanList
- -
List of span labelings
java.text.ParseException
- improper xml format will cause a parse exception
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |