OnlineTextLearning Tutorial
The OnlineTextClassifierLearner allows you to add Documents to a learner by passing in a document string rather than a document span. The OnlineTextClassifierLearner also returns a TextClassifier with a call to getTextClassifier() which gives you the score of a document string rather than a document span.
OnlineTextClassifierLearner API:
public interface OnlineTextClassifierLearner
{
/** Provide document string with a label and add to the learner*/
public void addDocument(String label, String text);
/** Returns the TextClassifier */
public TextClassifier getTextClassifier();
/** Returns the Classifier */
public Classifier getClassifier();
/** Tells the learner that no more examples are coming */
public void completeTraining();
/** Erases all previous data from the learner */
public void reset();
/** Returns an array of spanTypes that can be added to the learner */
public String[] getTypes();
/** Returns an annotated copy of TextLabels */
public TextLabels annotatedCopy(TextLabels labels);
}
Currently OnlineBinaryTextClassifierLearner is the only implementation of the OnlineTextClassifierLearner. OnlineBinaryTextClassifierLearner constructors:
/** Accepts an OnlineLearner and a Document String with no previous labeled data */
public OnlineBinaryTextClassifierLearner(OnlineClassifierLearner learner, String spanType)
/** Accepts an OnlineLearner, a SpanType, and labeledData to add to the learner */
public OnlineBinaryTextClassifierLearner(OnlineClassifierLearner learner, String spanType, TextLabels labeledData)
/** Accepts an OnlineLearner, a SpanType, labeledData to add to the learner, and a SpanFeatureExtractor */
public OnlineBinaryTextClassifierLearner(OnlineClassifierLearner learner, String spanType, TextLabels labeledData, SpanFeatureExtractor fe)
TextClassifier API:
public interface TextClassifier
{
/** Returns the weight for a String being in the positive class */
public double score(String text);
}
Minorthird has a builtin test class for these OnlineText classes in ui/OnlineLearner. This class accept most of the same variable as TrainClassifier. Type:
% java –Xmx500M edu.cmu.minorthird.ui.OnlineLearner –help
To see all options.
The three required
variables are:
-labels REPOSITORY_KEY Contains the data you would like to label and add to the learner
-spanType What spanType would you like to label the new data. Note: this will appear in a pull down list in the gui only if you specify labeledData with the same spanType otherwise you must specify this variable on the command line!
You must specify one of these:
-learner Specify the OnlineLearner you would like to use
-loadFrom Load
a previously saved
Optional Variables:
-labeledData Previously labeled data that you would like to add to the learner before labeling more data
-fe The feature extractor you would like to use
Lets try an example:
%java –Xmx500M edu.cmu.minorthird.ui.OnlineLearner –unlabeledData sample3.unlabeled –labels sample3.train –spanType fun –learner “new NaiveBayes()” –gui
A window much like the other ui windows should appear. From there you can edit any other parameters
you would like to change. Once you are
satisfied with your options, press “Start Task”
A window that looks like this should appear:
Note: you will have to expand the window to see the labels on all the buttons
Note: you will be able to highlight minorthird’s prediction
as soon as the window pops up, but you will only be able to compare to the
actual
You will notice that –choose label- is a pull-down menu where you can select either the spanType you trained on or NOTspanType you trained on. In this case the menu has the items fun and NOTfun.
Here is a summary of how to use each of the buttons on the bottom of the window:
Up: Scroll up one document
Down: Scroll down one document
-choose label- Label the current document as one of the items in the pull down menu
Add Doc(s) Add all labeled documents to the classifier
Note: once documents are added, their labels cannot be changed
Show Classifier Pops up window for the classifier for all trained data
Save TextLearner Saves the textLearner including all new data added
Reset Erase all previous example, reset classifier
Complete Training Let the classifier know there will be no new examples
Save Save the labels you have added to DIRECTORY_NAME.labels