edu.cmu.minorthird.ui
Class Recommended.VPSMMLearner2

java.lang.Object
  extended by edu.cmu.minorthird.text.learn.AnnotatorLearner
      extended by edu.cmu.minorthird.text.learn.SegmentAnnotatorLearner
          extended by edu.cmu.minorthird.ui.Recommended.VPSMMLearner2
Enclosing class:
Recommended

public static class Recommended.VPSMMLearner2
extends SegmentAnnotatorLearner

Uses the voted perceptron algorithm to learn the parameters for a hidden semi-Markov model (SMM).

This is a somewhat more expensive version of the VPHMMLearner, which allows features to describe properties of multi-token spans, rather than only properties of single tokens. This implements the training algorithm described in the final draft of Cohen & Saragi's KDD paper. This implementation is more memory-intensive than the VPSMMLearner2 package below, but faster, since the feature-extraction step is only performed once.

I generally prefer thus method to the (older) VPHMMLearner.

Reference: William W. Cohen and Sunita Sarawagi, Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004).


Nested Class Summary
 
Nested classes/interfaces inherited from class edu.cmu.minorthird.text.learn.SegmentAnnotatorLearner
SegmentAnnotatorLearner.SegmentAnnotator
 
Field Summary
 
Fields inherited from class edu.cmu.minorthird.text.learn.SegmentAnnotatorLearner
annotationType, dataset, fe, learner, maxWindowSize
 
Constructor Summary
Recommended.VPSMMLearner2()
          Extracted entities must be of length 4 or less.
Recommended.VPSMMLearner2(int epochs, int maxLen)
           
 
Method Summary
 
Methods inherited from class edu.cmu.minorthird.text.learn.SegmentAnnotatorLearner
getAnnotationType, getAnnotator, getCompressDataset, getCompressDatasetHelp, getDisplayDatasetBeforeLearning, getDisplayDatasetBeforeLearningHelp, getHistorySize, getSemiMarkovLearner, getSemiMarkovLearnerHelp, getSpanFeatureExtractor, hasNextQuery, nextQuery, reset, setAnnotationType, setAnswer, setCompressDataset, setDisplayDatasetBeforeLearning, setDocumentPool, setSemiMarkovLearner, setSpanFeatureExtractor
 
Methods inherited from class edu.cmu.minorthird.text.learn.AnnotatorLearner
getAnnotationTypeHelp, getSpanFeatureExtractorHelp
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Recommended.VPSMMLearner2

public Recommended.VPSMMLearner2()
Extracted entities must be of length 4 or less.


Recommended.VPSMMLearner2

public Recommended.VPSMMLearner2(int epochs,
                                 int maxLen)