|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Class Summary | |
---|---|
Minorthird | A launch bar for Minorthird applications. |
Minorthird is a collection of methods for learning to extract entities and categorize text.
Some basic concepts: in Minorthird, a collection of documents
are stored in a TextBase
.
Annotations about these documents are stored in a corresponding
TextLabels
object. Each
annotation asserts a category or property for a word, a document,
or a subsequence of words (aka a Span
). TextLabels stored information
from many sources: they might hold annotations produced by human
labelers (perhaps using a GUI tool like the TextBaseEditor
) or, annotations
produced by a hand-writted program, or annotations produced by a
learned program. Multiple TextLabels can annotate a single
TextBase, if necessary.
More about the text manipulation and processing can be found in the Javadocs for the minorthird.text and minorthird.text.mixup packages.
Annotated TextBases can be stored in many ways, so a
"repository" can be configured to hold a bunch of TextLabels and
their associated TextBases. TextLabels in the repository are
loaded with the FancyLoader
.
TextLabels and TextBases can also be loaded directly with
the TextBaseLoader
and the
TextBaseEditor
.
Moderately complex annotation programs can be implemented with
Mixup
, a special-purpose
annotation language which is part of Minorthird. Mixup can also
be used to generate features for learning algorithms. A sequence
of Mixup commands can be combined in a MixupProgram
. The MixupDebugger
is a gui tool for
testing a MixupProgram.
Minorthird contains a number of methods for learning to extract
Spans from a document, or learning to classify Spans. Top-level
programs for conducting learning experiments and training, testing
and applying Annotator
s can be found in
the edu.cmu.minorthird.ui
package. (The Help
class is a main program that, when
invoked, lists the relevant main methods.)
Under the hood, learning is performed using classes from inside
the edu.cmu.minorthird.classify
package. A ClassifierLearner
learns a Classifier
from a set of labeled
Example
s, usually stored in a
Dataset
. Several sequential
classification algorithms are also implemented in the package
edu.cmu.minorthird.classify.sequential
. The classify
package is independent of the edu.cmu.minorthird.text
package, but linked to it by the routines in edu.cmu.minorthird.text.learn
. Most importantly, the SpanFE
package implements what is
essentially a small feature extraction sub-language, embedded in
Java, which makes it possible to easily generate a wide variety of
features of a document, token, or Span. This language is even
more powerful because it can base features on annotations stored
in TextLabels
that are associated with
the Span.
|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |