edu.cmu.minorthird.text.learn
Class SpanFE

java.lang.Object
  extended by edu.cmu.minorthird.text.learn.SpanFE
All Implemented Interfaces:
MixupCompatible, SpanFeatureExtractor, java.io.Serializable
Direct Known Subclasses:
FeatureBuffer, Recommended.TokenPropUsingFE, SampleFE.AnnotatedSpanFE

public abstract class SpanFE
extends java.lang.Object
implements SpanFeatureExtractor, MixupCompatible, java.io.Serializable

A Feature Extractor which converts a Span to an Instance.

Typical use of this would be something like the following:

 SpanFE fe=new SpanFE(labels){
 
        public void extractFeatures(Span span){
                from(span).tokens().emit();
                from(span).left().subSpan(-2,2).emit();
                from(span).right().subSpan(0,2).emit();
                from(span).right().contains("obj").emit();
        }
 };
 
 Instance inst=fe.extractInstance(span);
 
Generally, to use this class, one subclasses it and implements the extractFeatures method, using a chain of feature-extracting actions which starts with 'from' and ends with 'emit'.

The methods tokens(), subSpan(), and so on are defined in subclasses of SpanFE.Result, and are summarized here.

Author:
William Cohen
See Also:
Serialized Form

Nested Class Summary
static class SpanFE.Filter
          An abstract class that can be used to filter SpanSetResults.
static class SpanFE.Function
          An abstract class that can be used to change SpanSets
static class SpanFE.Result
          Encodes an intermediate result of the SpanFE process.
static class SpanFE.SetResult<T>
          An intermediate result of a SpanFE process where the object being operated on is a Set of something.
static class SpanFE.SpanResult
          An intermediate result of an SpanFE process where a span is being processed.
static class SpanFE.SpanSetResult
          An intermediate result of a SpanFE process where the object being operated on is a set of spans.
static class SpanFE.StringBagResult
          An intermediate result of a SpanFE process where the object being operated on is a set of strings.
static class SpanFE.TokenSetResult
          An intermediate result of a SpanFE process where the object being operated on is a set of tokens.
 
Field Summary
protected  AnnotatorLoader annotatorLoader
           
protected  MutableInstance instance
           
protected  java.lang.String requiredAnnotation
           
protected  java.lang.String requiredAnnotationFileToLoad
           
static int STORE_AS_BINARY
          Store features as binary, whenever possible, even if occurence counts are ignored.
static int STORE_AS_COUNTS
          Store features as numeric counts, whenever possible
static int STORE_COMPACTLY
          Store features as binary or counts, trying to reduce storage while maintaining information.
 
Constructor Summary
SpanFE()
          Create a feature extractor
 
Method Summary
 void emit(SpanFE.SpanResult result)
          Called by some SpanFE.Result subclass when a 'pipeline' of extraction steps is ended with a SpanResult.
 void emit(SpanFE.SpanSetResult result)
          Called by some SpanFE.Result subclass when a 'pipeline' of extraction steps is ended with a SpanSetResult.
 void emit(SpanFE.StringBagResult result)
          Called by some SpanFE.Result subclasses when a 'pipeline' of extraction steps is ended with a StringBagResult.
 void emit(SpanFE.TokenSetResult result)
          Called by some SpanFE.Result subclass when a 'pipeline' of extraction steps is ended with a TokenSetResult.
 void extractFeatures(Span span)
          Implement this with a specific set of SpanFE 'pipelines'.
abstract  void extractFeatures(TextLabels labels, Span span)
          Implement this with a specific set of SpanFE 'pipelines'.
 Instance extractInstance(Span span)
          Deprecated. Use extractInstance(TextLabels labels,Span s)
 Instance extractInstance(TextLabels labels, Span span)
          Extract an Instance from a span
 SpanFE.SpanResult from(Span s)
          Starts a 'pipeline' of extraction steps, and adds the resulting features to the instance being built.
static SpanFE.SpanResult from(Span s, FeatureBuffer buffer)
          Starts a 'pipeline' of extraction steps, and adds the resulting features to the instance being built.
 java.lang.String getAnnotationProvider()
           
 java.lang.String getRequiredAnnotation()
          Retrieve the annotation required by this SpanFeatureExtractor.
 void requireMyAnnotation(TextLabels labels)
          Make sure the required annotation is present.
 void setAnnotationProvider(java.lang.String classNameOrMixupFileName)
          Specify a mixup file or java class to use to provide the annotation.
 void setAnnotatorLoader(AnnotatorLoader newLoader)
          Attach an annotatorLoader to the SpanFeatureExtractor, which is used to find the required Annotation (and any other Annotations that that it might recursively require.)
 void setFeatureStoragePolicy(int p)
          Set the policy for creating features.
 void setRequiredAnnotation(java.lang.String requiredAnnotation)
          Specify an annotator to run before feature generation.
 void setRequiredAnnotation(java.lang.String requiredAnnotation, java.lang.String annotationProvider)
          Simultaneously specify an annotator to run before feature generation and a mixup file or class that generates it.
 void trace(SpanFE.Result result)
          Subclass this to change the tracing behavior.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

STORE_AS_BINARY

public static final int STORE_AS_BINARY
Store features as binary, whenever possible, even if occurence counts are ignored.

See Also:
Constant Field Values

STORE_AS_COUNTS

public static final int STORE_AS_COUNTS
Store features as numeric counts, whenever possible

See Also:
Constant Field Values

STORE_COMPACTLY

public static final int STORE_COMPACTLY
Store features as binary or counts, trying to reduce storage while maintaining information.

See Also:
Constant Field Values

instance

protected transient MutableInstance instance

requiredAnnotation

protected java.lang.String requiredAnnotation

requiredAnnotationFileToLoad

protected java.lang.String requiredAnnotationFileToLoad

annotatorLoader

protected AnnotatorLoader annotatorLoader
Constructor Detail

SpanFE

public SpanFE()
Create a feature extractor

Method Detail

setFeatureStoragePolicy

public void setFeatureStoragePolicy(int p)
Set the policy for creating features.

Parameters:
p - should be one of SpanFE.STORE_AS_BINARY, SpanFE.STORE_AS_COUNTS, SpanFE.STORE_COMPACTLY

setRequiredAnnotation

public void setRequiredAnnotation(java.lang.String requiredAnnotation,
                                  java.lang.String annotationProvider)
Simultaneously specify an annotator to run before feature generation and a mixup file or class that generates it.


setRequiredAnnotation

public void setRequiredAnnotation(java.lang.String requiredAnnotation)
Specify an annotator to run before feature generation.

Specified by:
setRequiredAnnotation in interface MixupCompatible

getRequiredAnnotation

public java.lang.String getRequiredAnnotation()
Description copied from interface: MixupCompatible
Retrieve the annotation required by this SpanFeatureExtractor.

Specified by:
getRequiredAnnotation in interface MixupCompatible

setAnnotationProvider

public void setAnnotationProvider(java.lang.String classNameOrMixupFileName)
Specify a mixup file or java class to use to provide the annotation.


getAnnotationProvider

public java.lang.String getAnnotationProvider()

setAnnotatorLoader

public void setAnnotatorLoader(AnnotatorLoader newLoader)
Description copied from interface: MixupCompatible
Attach an annotatorLoader to the SpanFeatureExtractor, which is used to find the required Annotation (and any other Annotations that that it might recursively require.)

Specified by:
setAnnotatorLoader in interface MixupCompatible

requireMyAnnotation

public void requireMyAnnotation(TextLabels labels)
Make sure the required annotation is present.


extractInstance

public final Instance extractInstance(Span span)
Deprecated. Use extractInstance(TextLabels labels,Span s)


extractInstance

public final Instance extractInstance(TextLabels labels,
                                      Span span)
Extract an Instance from a span

Specified by:
extractInstance in interface SpanFeatureExtractor

from

public final SpanFE.SpanResult from(Span s)
Starts a 'pipeline' of extraction steps, and adds the resulting features to the instance being built.

As an example: fe.from(s).tokens(s).eq().emit() adds bag-of-words type features.


from

public static final SpanFE.SpanResult from(Span s,
                                           FeatureBuffer buffer)
Starts a 'pipeline' of extraction steps, and adds the resulting features to the instance being built.

This is intended to be used as an alternative to using the SpanFE class to build an Span2Instance converter, eg


 fe=new Span2Instance(){
 
        public extractInstance(Span s){
                FeatureBuffer buf=new FeatureBuffer(s);
                SpanFE.from(s,buf).tokens().emit();
                SpanFE.from(s,buf).left().subspan(-2,2).emit();
                SpanFE.from(s,buf).right().subspan(0,2).emit();
                buf.getInstance();
        }
 }
 


emit

public void emit(SpanFE.StringBagResult result)
Called by some SpanFE.Result subclasses when a 'pipeline' of extraction steps is ended with a StringBagResult.


emit

public void emit(SpanFE.TokenSetResult result)
Called by some SpanFE.Result subclass when a 'pipeline' of extraction steps is ended with a TokenSetResult.


emit

public void emit(SpanFE.SpanSetResult result)
Called by some SpanFE.Result subclass when a 'pipeline' of extraction steps is ended with a SpanSetResult.


emit

public void emit(SpanFE.SpanResult result)
Called by some SpanFE.Result subclass when a 'pipeline' of extraction steps is ended with a SpanResult.


extractFeatures

public void extractFeatures(Span span)
Implement this with a specific set of SpanFE 'pipelines'. Each pipeline will typically start with 'start(span)' and end with 'emit()'.


extractFeatures

public abstract void extractFeatures(TextLabels labels,
                                     Span span)
Implement this with a specific set of SpanFE 'pipelines'. Each pipeline will typically start with 'start(span)' and end with 'emit()'.


trace

public void trace(SpanFE.Result result)
Subclass this to change the tracing behavior.