Classify Package Tutorial

 

The classification package is for using minorthird without text.  This can be useful when you want to classify groups that has a class and a list of features.  For example: consider this data for whether to play tennis or not:

 

The example above is a sample of simple (non-sequential data).  The b indicates that the class is binary (i.e. it can only be POS or NEG.)  For multi-class data, you will want a k before each line.

 

The classify package can also evaluate sequential data.  For example: to make the dataset above sequential, you may want to consider whether or not you played tennis the day before.  In this case, you can list the dataset in sequences with a * between each sequence.  For example:

b week1 NEG sunny humid temp=85

b week1 POS sunny humid temp=90

*

b week2 POS sunny dry temp=76

b week2 NEG sunny humid temp = 80

 

IMPORTANT: to load a sequential dataset, you must specify the –seq option before the –data option on the command line.

 

Train:

Train lets you train a classifier on given data.

% java edu.cmu.minorthird.classify.Train –data simpleClassifyData.trainsaveAs simpleClassify.ann

 

Note:  To see what other options you have, to using the –help command

 

The output should look like this:

 

Notice that there are no results since this is a train.

 

Test:

% java edu.cmu.minorthird.classify.UIseq –op test –data sub_sig.testclassifierFile sub_sig_class.eval

 

 

Using the gui:

Train:

To run an experiment in the gui, first type the command:

% java –Xmx500M edu.cmu.minorthird.classify.UIgui

 

When the window appears, click the Edit button under Parameter Modification.  This will make the Property Editor window appear.  The _operation: field for the property editor should be set to trainTest, so you will want to change it to train:

 

Once again, make sure that you check the sequentialMode checkbox before specifiying your dataset if your data is sequential.  Remember, the data will not load if you do not check this button before specifying your data.  Next specify the datasetFilename by clicking the Browse button next to the datasetFilename field and selecting your datafile.

 

These next fields have defaults, but can be changed:

 

Now since this is a training experiment, you want to save the learned classifier so you can use it on future test data.  To save your classifier, type what you would like to name the file with a .eval extension in saveAsFilename textField.  For example: you can save your classifier in a file named myClassifier.eval.  You do NOT need to specify the testDatasetFilename since this is only a training experiment.

 

 

Click OK to save these parameters, and click the Start Task button to start the experiment.  When the experiment finishes you will notice that there are no results since this is only a training experiment.  Your learned classifier will be saved in your current directory.

 

 

Test:

If a gui window is not currently open, first type the command:

% java –Xmx500M edu.cmu.minorthird.classify.UIgui

 

When the window appears, click the Edit button under Parameter Modification.  This will make the Property Editor window appear.  The _operation: field for the property editor should be set to trainTest or train, so you will want to change it to test.

 

Next you need to specify your classifier filename.  To do this click the Browse button next to the classifierFilename textfield and find the .eval file to which you saved your classifier.

 

Once again, make sure that you check the sequentialMode checkbox before specifiying your dataset if your data is sequential.  Remember, the data will not load if you do not check this button before specifying your data.  Next specify the datasetFilename by clicking the Browse button next to the datasetFilename field and selecting your datafile.

 

Click the OK button to save these parameter and click the Start Task button to run the experiment.

 

Training and Testing using one Dataset

 

Using the command line:

 

 

To learn what options are available from the command line type:

% java –Xmx500M edu.cmu.minorthird.classify.UI –help

 

Lets first try running a classification experiment on a sequential dataset.  When using a sequential dataset, it is important to use the –seq option so that the program can properly process the data.  Make sure the –seq option is declared before the dataset.

 

To run an experiment, type this command:

% java –Xmx500M edu.cmu.minorthird.classify.UI –op trainTestseq –data sub_sig.data

 

Note:  If you get a java.lang.reflect.InvocationTargetException it probably means that you did not specify the –seq option before the dataset.

 


The output of the program should look like this:

 

Using the gui:

To run an experiment in the gui, first type the command:

% java –Xmx500M edu.cmu.minorthird.classify.UIgui

 

When the window appears, click the Edit button under Parameter Modification.  This will make the Property Editor window appear.  The _operation: field for the property editor should already be set to trainTest, which is what you want for this experiment.  The next very important step is the check the sequentialMode checkbox towards the bottom of the window… if this button is not checked, you will not be able to properly load your data.

 

Note:  If the data is taking a long time to load, the sequentialMode box is most likely inappropriately checked.  Try closing the window and trying again.

 

Next specify the datasetFilename by clicking the Browse button next to the text field.  Find the directory where you saved the file, select the file, and click the Open button. 

 

These next fields have defaults, but can be changed:

 

Click the OK button to close the Property Editor and press the “Start Task” button to begin the experiment.

 

 

 

 

 

 

 

 

SourceForge.net Logo