How To Run a Classification Task

TestClassififier Tutorial

TestClassifier tests a classifier created by TrainClassifier to see how it performs on a test set. Remember, a classifier classifies an entire document, so the result of this experiment simply outputs statistics about how many documents are labeled correctly. For this example, we will use the classifier obtained by running the example in the TrainTestClassifier Tutorial. For quick reference here is the command line experiment to obtain the classifier that will be used in this example: (Note: samples are built into code, so no setup is required to make them work)

% java –Xmx500M edu.cmu.minorthird.ui.TrainClassifier –labels sample3.train –spanType fun –saveAs sample3.ann

Sample3.ann is the classifier that is saved and used for this experiment, it classifies which documents are fun.

To see how to label and load your own data for this task, look at the Labeling and Loading Data Tutorial.

To run this type of task start with:

java –Xmx500M edu.cmu.minorthird.ui.TestClassifier

Editing Parameters:

Like all ui tasks, all the parameters for TestClassifier may be specified in either the gui or by the command line. To use the gui, simple type the –gui on the command line. It is also possible to mix and match where the parameters are specified; for example: one can specify two parameters on the command line and use the gui to select the rest. For this reason, the step by step process for this experiment will first explain how to select a parameter value in the gui and then how to set the same parameter on the command line.

To view a list of parameters and their function run:

% java –Xmx500M edu.cmu.minorthird.ui.TestClassifier –help

% java –Xmx500M edu.cmu.minorthird.ui.TestClassifier –gui

And click on the “Parameters” button next to Help or and click on the “?” button next to each field in the Property Editor to see what it is used for.

If using the gui, click the edit button next to TestClassifier when a window appears to edit the parameters. A Property Editor window will appear:

Editing Parameters:

1) There are four bunches of parameters that may be altered. A collection of documents (labelsFilename), an Annotator (loadFrom), and a spanType or spanProp are required. All other fields are optional. For more information about any of the fields, click on the ? (Help) next to the field.

1. additionalParameters: contains one parameter for specifying the annotator to load.

a) GUI: Enter sample3.ann (or the filename you chose for your annotator) in the loadFrom text field.

b) Command Line: -loadFrom sample3.ann (or the filename you chose for your annotator)

2. baseParameters: contains the options for loading the collection of documents.

a) GUI: Enter sample3.test in the labelsFilename textField. Sample3.test contains labeled documents, but it is useful for comparing trueLabels to classified labels.

b) Command Line: use the –labels option followed by the repositoryKey or the directory of files to load. In this case specify –labels sample3.test

3. saveParameters: contains one parameter for specifying a file to save the result to. Saving is optional, but useful for using result in other experiments or for reference. It is useful to save in the format labelsFilename.labels wheree labelsFilename is the directory entered in the labelsFilename textfield. This way minorthird can automatically load the labels produced by this experiment in another minorthird task.

a) GUI: Type sample3.labels in the saveAs textField

b) Command Line: -saveAs sample3.labels

4. signalParameters: Either spanType or spanProp must be specified as the type to learn. For this experiment we will test spanType fun.

a) GUI: click the “Edit button next to singnalParameters. Select “fun” from the pull down menu next to spanType.

b) Command Line: specify –spanType fun

2) Feel free to try changing any of the other parameters including the ones in advanced options.

a. GUI: Click on the help buttons to get a feeling for what each parameter does and how changing it may affect your results. Once all the parameters are set, click the “OK” button on the PropertyEditor.

b. Command Line: Add other parameters to the command line (use –help option to see other parameter options) If there is an option that can be set in the gui, but there is no specific parameter for setting it in the help parameter definition, the –other option may be used. To see how to use this option, look at the Command Line Other Option Tutorial.htm

3) GUI: Once finished editing parameters, save parameter modification by clicking the “OK” button on the Property Editor.

Show Labeled Data:

a) GUI: Press the Show Labels button if you would like to view the input data for the classification task.

b) Command Line: add –showLabels to command line

Getting and Interpreting Results:

1) Command Line: specify –showResult (this is for seeing the graphical result, if this option is not set, only the output statistics of the task will be seen)

2) Press Start Task under execution controls in the gui or enter on the command line to run the experiment. The task will vary in the amount of time it takes depending on the size of the data set and the classifier. When the task is finished, the error rates will appear in the Error messages and output text area along with the total time it takes to run the experiment.

3) Once the experiment is completed, click the View Results Button in the Execution Controls section to see detailed results in the gui or the window will automatically appear if the –showResult option was chosen on the command line. The Details tab shows the testing examples in the top left, the classifier in the top right, the selected test example’s features, source, and subpopulation in the bottom left, and the explanation for the classification of the selected test example in the bottom right (expand the tree to see the details of the explanation.)

4) Click on the Evaluation tab at the top and the Evaluation tab below that to view your results. The summary tab shows you the results that were printed in the output window when you ran the experiment (it shows you the numbers like error rate and F1.) The Precision/Recall tab show you the graph of Recall vs. Precision graph for this experiment. The Confusion Matrix tab shows you how many things the classifier predicted as positive that are positive and how many that it predicted as positive that are negative and visa versa.

5) Press the “Clear Window” button to clear all output from the output and error messages window.