OnlineTextLearning Tutorial

 

The OnlineTextClassifierLearner allows you to add Documents to a learner by passing in a document string rather than a document span.  The OnlineTextClassifierLearner also returns a TextClassifier with a call to getTextClassifier() which gives you the score of a document string rather than a document span. 

 

OnlineTextClassifierLearner API:

public interface OnlineTextClassifierLearner

{

    /** Provide document string with a label and add to the learner*/

    public void addDocument(String label, String text);

 

    /** Returns the TextClassifier */

    public TextClassifier getTextClassifier();

 

    /** Returns the Classifier */

    public Classifier getClassifier();

 

    /** Tells the learner that no more examples are coming */

    public void completeTraining();

 

    /** Erases all previous data from the learner */

    public void reset();

 

    /** Returns an array of spanTypes that can be added to the learner */

    public String[] getTypes();

 

    /** Returns an annotated copy of TextLabels */

    public TextLabels annotatedCopy(TextLabels labels);

}

 

Currently OnlineBinaryTextClassifierLearner is the only implementation of the OnlineTextClassifierLearner.  OnlineBinaryTextClassifierLearner constructors:

 

/** Accepts an OnlineLearner and a Document String with no previous labeled data */

public OnlineBinaryTextClassifierLearner(OnlineClassifierLearner learner, String spanType) 

 

/** Accepts an OnlineLearner, a SpanType, and labeledData to add to the learner */

public OnlineBinaryTextClassifierLearner(OnlineClassifierLearner learner, String spanType, TextLabels labeledData)

 

 

 

/** Accepts an OnlineLearner, a SpanType, labeledData to add to the learner, and a SpanFeatureExtractor */

 public OnlineBinaryTextClassifierLearner(OnlineClassifierLearner learner, String spanType, TextLabels labeledData, SpanFeatureExtractor fe)

 

TextClassifier API:

public interface TextClassifier

{

    /** Returns the weight for a String being in the positive class */

    public double score(String text);

 

}

 

Minorthird has a builtin test class for these OnlineText classes in ui/OnlineLearner.  This class accept most of the same variable as TrainClassifier.  Type:

% java –Xmx500M edu.cmu.minorthird.ui.OnlineLearner –help

To see all options.

 

The three required variables are:

-labels REPOSITORY_KEY                           Contains the data you would like to label and add to the learner

-spanType                                                        What spanType would you like to label the new data.  Note: this will appear in a pull down list in the gui only if you specify labeledData with the same spanType otherwise you must specify this variable on the command line!

 

You must specify one of these:

-learner                                                            Specify the OnlineLearner you would like to use

-loadFrom                                                        Load a previously saved textLearner

 

Optional Variables:

-labeledData                                                     Previously labeled data that you would like to add to the learner before labeling more data

-fe                                                                    The feature extractor you would like to use

 

 

Lets try an example:

%java –Xmx500M edu.cmu.minorthird.ui.OnlineLearner –unlabeledData sample3.unlabeled –labels sample3.train –spanType fun –learner “new NaiveBayes()” –gui

 

A window much like the other ui windows should appear.  From there you can edit any other parameters you would like to change.  Once you are satisfied with your options, press “Start Task”
A window that looks like this should appear:

Note: you will have to expand the window to see the labels on all the buttons

Note: you will be able to highlight minorthird’s prediction as soon as the window pops up, but you will only be able to compare to the actual label (in this case fun) once you have labeled one of the documents.

 

You will notice that –choose label- is a pull-down menu where you can select either the spanType you trained on or NOTspanType you trained on.  In this case the menu has the items fun and NOTfun.

 

Here is a summary of how to use each of the buttons on the bottom of the window:

Up:                               Scroll up one document

Down:                          Scroll down one document

-choose label-              Label the current document as one of the items in the pull down menu

Add Doc(s)                  Add all labeled documents to the classifier

                                    Note: once documents are added, their labels cannot be changed

Show Classifier             Pops up window for the classifier for all trained data

Save TextLearner         Saves the textLearner including all new data added

Reset                            Erase all previous example, reset classifier

Complete Training        Let the classifier know there will be no new examples

Save                             Save the labels you have added to DIRECTORY_NAME.labels

SourceForge.net Logo