The only third-party software that is required to run MinorThird is java (version 1.5.0 or later). If you simply plan on conducting experiments using the provided tools you can simply install the JRE. However, if you plan on compiling MinorThird yourself, making additions to the API's or using them in your software, then you will need the full java SDK as well as the Ant (version 1.6.5 or later) Java build utility. Follow the installation instructions provided with Java and Ant for their installation and configutation.
Getting the MinorThird Distribution
There are two ways to obtain the MinorThird distribution:
- Download the one of the latest package resleases from the MinorThird project page on SourceForge.
- The minorthird package is the full release of the system. It contains source code, applications, and documentation.
- The minorthird-jar package is a single jar file containing the pre-compiled MinorThird packages as well as all libraries required for its use.
- The minorthird-classify package is a single jar file containing only the pre-compiled MinorThird classification packages and required libraries.
- The Documentation package is a zip file of static versions of this entire web site.
- Use our anonymous cvs server to checkout the most up-to-date (unreleased) changes to the code base.
- You will need a version of the cvs client that supports ssh.
- Open a command shell and set the following two environment variables:
- Execute the following command:
- The minorthird source tree should now appear in your current directory.
- Once you have initially checked out the source tree you can use the following command from the minorthird directory to get the latest updates:
Compiling The Source
If you have elected to download or checkout the MinorThird source and compile it yourself, then execute the following steps from a command shell:
To generate the javadocs for the api run the following command from the minorthird directory: To test that everything was compiled successfully run the following command from the minorthird directory:
- cd to the directory where you checked out or unzipped the source tree.
- cd into the minorthird directory.
- Run the setup script for your operating system to set up the CLASSPATH environment variable:
- Using Windows command prompt execute: 'script\setup'
- Using Cygwin execute: 'source script/setup.sh'
- Using Linux execute: 'source script/setup.linux'
- To compile the code run the following command from the minorthird directory:
Now that your have MinorThird installed you can begin using it to conduct classification and extraction experiments. The basic steps to conducting an experiment are: train an annotator (classifier or extractor) on sample data, run this annotator on test data and analyze its performance, change the settings and repeat to find the best, and finally apply the best annotator on the "real" data of interest to answer the original question. There are two ways to execute these streps in MinorThird:
- Use the provided MinorThird UI tools to directly conduct an experiment. MinorThird provides many tools for executing one or more of the steps in the experiment process as well as utilities that cobmine some of the steps to make the process a little easier. All MinorThird tools are invoked via the command line, but can be used in one of two ways: Graphically or Command Line.
A window should appear. This window is the main experiment control window for all GUI apps in Minorthird. In the top section (labeled Parameter modification) it shows what program is being executed. Pressing the Edit button allows you to adjust the parameters of the program. The middle section contains the buttons that control the experiment. Once you have set all the options in the top section, you press Start Task to execute the program you have chosen to run. Any output that the program generates will be printed to the bottom section labeled Error messages and output. Finally, once the execution is complete the View Results button will be enabled. Clicking on this button will pop up a window that shows the results of your experiment. These controls are the same for virtually every program in theMinorthird suite.
- Other arguments may be provided on the command line in addition to -gui. The invoked program will read the provided values and pre-populate the fields in the GUI. Execute the following command so see this in action. Once the window appears, click on the Edit button and notice that the field named labelsFilename in the baseParameters section is populated with the value supplied to the -labels argument on the command line.
java edu.cmu.minorthird.ui.TrainExtractor -labels sample1.train -gui
- To see the list of all possible arguments that a tool accepts simply provide the -help argument.
java edu.cmu.minorthird.ui.TrainExtractor -help
- Once you are comfortable setting up experiments in the GUI, you will probably want to just supply all the parameters on the command line and skip the GUI altogether. Keep in mind that each tool has a set of minimal parameters required for proper execution. All of the arguments listed using -help, that are enclosed in square brackets ('[' and ']') are OPTIONAL, the rest are required. Running from the command line is best demonstrated with the following examples:
java edu.cmu.minorthird.ui.TrainExtractor -labels sample1.train -spanType trueName -saveAs sample1.ann
java edu.cmu.minorthird.ui.TestExtractor -labels sample1.test -spanType trueName -loadFrom sample1.ann
The first command trains an annotator on the sample1.train dataset (this is a built-in dataset) and saves it in the current directory as sample1.ann. The -spanType argument tells the program to train the annotator to label spans of tokens that it thinks correspond to instances of trueName. The sample1.train dataset contains examples of these instances that are used to train the annotator. The second command tests this trained annotator (specified using the -loadFrom argument) against the sample1.test dataset (also a built-in dataset) and prints the performance to the screen. In this command the -spanType argument tells the program which labels to compare the annotators predictions to and in this case our testing dataset has its examples named the same as the training dataset (this is NOT required).
- A complete list of all the programs available in the Minorthird application suite can be found here.
- Step by step instructions on how to use each of these tools is available in the tutorials section.
Use the MinorThird libraries inside a custom java application to conduct experiments and analyze the results. The most powerful way to utilize the capabilities of Minorthird is to create, run and evaluate experiments inside your own custom application.
- Some of the specific advantages of using Minorthird in this way are
- It allows you to present the Minorthird tools to a user as an integrated part of your application, with an interface that makes sense in the context of your application.
- It allows you to automatically run multiple experiments concurrently or in succession using the results of a previous experiment to derive the parameters for the next.
- You can automate the experiment process eliminating the need for human intervention.
- You can store the results of experiments (statistics or annotations) in any form you choose (ie. custom file format or relational database) instead of just the supported Minorthird formats.
- The Minorthird API is broken up into 4 main packages:
- edu.cmu.minorthird.classify.* -
- edu.cmu.minorthird.text.* -
- edu.cmu.minorthird.ui.* -
- edu.cmu.minorthird.util.* -
- See the javadocs for a detailed description of the complete Minorthird API specification.
- The basic steps to performing an experiment using the Minorthird API are:
- Load your data into a TextBase (extractor) or a DataSet (classifier).
- Instantiate an AnnotatorTeacher (extractor) or ClassifierTeacher (classifier).
- Configure the teacher.
- Instantiate an instance of AnnotatorLearner (extractor) or ClassifierLearner (classifier) the represents the desired learner algorithm.
- Configure the learner.
- Call teacher.train(learner) to create a trained extractor or classifier.
- See the UserGuide for a more detailed explanation on using the Minorthird API.