|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object edu.cmu.minorthird.text.mixup.Mixup
public class Mixup
A simple pattern-matching and information extraction language.
EXAMPLE: ... in('begin') @number? [ any{2,5} in('end') ] ... && [!in('begin')*] && [!in('end')*] BNF: simplePrim -> [!] simplePrim1 simplePrim1 -> id | a(DICT) | ai(DICT) | eq(CONST) | eqi(CONST) | re(REGEX) | any | ... | PROPERTY:VALUE | PROPERTY:a(foo) ) prim -> < simplePrim [,simplePrim]* > | simplePrim repeatedPrim -> [L] prim [R] repeat | @type | @type? repeat -> {int,int} | {,int} | {int,} | {int} | ? | * | + pattern -> | repeatedPrim pattern basicExpr -> pattern [ pattern ] pattern basicExpr -> (expr) expr -> basicExpr "||" expr expr -> basicExpr "&&" expr SEMANTICS: basicExpr is pattern match - like a regex, but returns all matches, not just the longest one token-level tests: eq('foo') check token is exactly foo 'foo' is short for eq('foo') re('regex') checks if token matches the regex eqi('foo') check lowercase version of token is foo 'foo' or eq('foo') checks a token is equal to 'foo' a(bar) checks a token is in dictionary 'bar' ai(bar) checks that the token is in dictionary 'bar', ignoring case color:red checks that the token has property 'color' set to 'red' color:a(primaryColor) checks that the token's property 'color' is in the dictionary 'primaryColor' !test is negation of testThe name's an acronym for My Information eXtraction and Understanding Package.conjoins token-level tests any is true for any token token-sequences: test? is 0 or 1 tokens matching test test+ is 1+ tokens matching test test* is 0+ tokens matching test test{3,7} is between 3 and 7 tokens matching test ... is equal to any* @foo
matches a span of type foo@foo?
matches a span of type foo or the empty sequence L means sequence can't be extended to left and still match R means sequence can't be extended to right and still match expr || expr is union expr && expr is piping: generate with expr1, filter with expr2
Nested Class Summary | |
---|---|
static class |
Mixup.MixupTokenizer
|
static class |
Mixup.ParseException
Signals an error in parsing a mixup document. |
Field Summary | |
---|---|
static int |
maxNumberOfMatches
Without constrains, the maximum number of times a mixup expression can extract something from a document of length N is O(N*N), since any token can be the begin or end of an extracted span. |
static int |
maxNumberOfMatchesPerToken
Without constraints, the maximum number of times a mixup expression can extract something from a document of length N is O(N*N), since any token can be the begin or end of an extracted span. |
static int |
minMatchesToApplyConstraints
Without constraints, the maximum number of times a mixup expression can extract something from a document of length N is O(N*N). |
static java.util.regex.Pattern |
tokenizerPattern
|
Constructor Summary | |
---|---|
Mixup(Mixup.MixupTokenizer tok)
|
|
Mixup(java.lang.String pattern)
Create a new mixup query. |
Method Summary | |
---|---|
java.util.Iterator<Span> |
extract(TextLabels labels,
java.util.Iterator<Span> spanLooper)
Extract subspans from each generated span using the mixup expression. |
static void |
main(java.lang.String[] args)
|
java.lang.String |
toString()
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static int minMatchesToApplyConstraints
public static int maxNumberOfMatchesPerToken
public static int maxNumberOfMatches
public static final java.util.regex.Pattern tokenizerPattern
Constructor Detail |
---|
public Mixup(java.lang.String pattern) throws Mixup.ParseException
Mixup.ParseException
public Mixup(Mixup.MixupTokenizer tok) throws Mixup.ParseException
Mixup.ParseException
Method Detail |
---|
public java.util.Iterator<Span> extract(TextLabels labels, java.util.Iterator<Span> spanLooper)
public java.lang.String toString()
toString
in class java.lang.Object
public static void main(java.lang.String[] args)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |