Tutorials

MinorThird is a collection of Java classes for storing text, annotating text, and learning to extract entities and categorize text. It consists of four main packages:

  1. Classify - contains the machine learning algorithms for extraction and classification as well as data structures for storing non-text data, classifiers, and evaluations of experiments. The classify package can stand on its own, so should not call any of the other packages.
  2. Text - this package contains the classes necessary to process text data such as emails. The text package also contains Mixup (which stands for My Information eXtraction and Understanding Program), which is a matching language for modifying TextLabels.
  3. UI - as the name implies, this package provides a user interface for running learning experiments on text data.
  4. Util - provides utilities such as the command line processor and gui framework.

To download and run MinorThird, take a look at the Getting Started tutorial.

As stated above, the classify package is where the learning is performed and can be used on its own to perform experiments on non-text data (i.e. a classification label (such as POS or NEG) and a list of features (such as symptoms)). How to use the classify package is documented in Classify Package Tutorial.

The ui package contains several classes for viewing, editing, and running experiments on text data. To learn how to put data in a format that minorthird can recognize, look at the Labeling and Loading Data Tutorial.

Before getting started looking at different classes, here is some minorthird terminology that is helpful to know: