Presentation is loading. Please wait.

Presentation is loading. Please wait.

Constraint Grammar ESSLLI

Similar presentations


Presentation on theme: "Constraint Grammar ESSLLI"— Presentation transcript:

1 Constraint Grammar ESSLLI
Wednesday: Syntax

2 QA NER What is CG used for? VISL grammar games: Machinese parsers
News feed and relevance filtering Opinion mining in blogs Science publication monitoring QA Machine translation Spell- and Grammar checking Corpus annotation Relational dictionaries: DeepDict NER Annotated corpora: CorpusEye

3 CG Input remember CG is modular!
output from one module becomes input to the next module pre- processor analyzer nomorph CG your CG text output cat file | preprocessor | analyzer | vislcg3 -g engcg.nomorph.cg | vislcg3 -g yourcgfile ... echo ”text” | preprocessor | analyzer | vislcg3 -g engcg.nomorph.cg | vislcg3 -g yourcgfile ...

4 CG input it can be very simple preprocessor:
perl -wnpe 's/ /\n/g; s/([^\w\n])/\n$1\n/g; s/\n\n+\ng/;' analyzer: #!/usr/bin/perl -w use utf8; open(FH, "< /home/eckhard/parsers/eng/lex/baselex.eng"); while (<FH>) { s/\t\/\n\t/g; ($word,$tags) = split /,/; $lex{$word} .= "\t" . $tags; } while (<>) { $word =$_; chop $word; print "\"<$word>\"\n"; if ($lex{$word}) { print $lex{$word};

5 Syntactic annotation styles
Focus on syntactic form Phrase structure grammar (PSG) -> labelled brackets Dependency grammar (DG) -> labelled arcs Focus on syntactic function Constraint grammar (CG) -> dependency pointers Focus on semantic function Case roles (Filmore) Lexical Functional Grammar (LFG) CG

6 Syntactic models 1. The flat classical model: word function, no form
My fat cat never eats fish S A V O word-based psychologically easy to grasp function markers attached to semantically heavy words easy to turn into tags: My determiner PRE-N fat adjective PRE-N cat noun S never adverb A eats verb V fish noun O CG

7 2. Dependency grammar strictly token based (e.g. Tesnière)
expresses syntactic form as asymmetrical relations (“arcs”) between head tokens and dependent tokens no zero tokens, no nonterminal nodes each dependent is allowed 1 head (exc. secondary arcs) directed acyclic graphs projective or non-projective (crossing branches / discontinuity) My fat cat never eats fish. CG

8 Dependency grammar annotation
The #1->3 last #2->3 sattelites #3->9 launched #4->3 by #5->4 the #6->7 US #7->5 never #8->9 reached #9->0 ROOT orbit #10->9 The <id=1> <head=3> last <id=2> <head=3> sattelites <id=3> <head=9> launched <id=4> <head=3> by <id=5> <head=4> the <id=6> <head=7> US <id=8> <head=5> enver <id=8> <head=9> reached <id=9> <head=0> ROOT orbit <id=10><head=9> CG

9 Dependency grammar as trees
reached diagnóstico never orbit The last sattelites launched by US CG the

10 Dependency grammar with brackets “a la PSG”, e.g. TIGER
Penn-style: [V eats [N cat [DET my][ADJ fat]] [ADV never] [N fish]] Vertical: [V eats [N cat [DET my] [ADJ fat] ] [ADV never] [N fish] CG

11 3. Constituent Grammar hierarchical word grouping with non-terminals (e.g. Chomsky) syntactic form, no (or implicit) function expressed by rewriting rules, where a nont-terminal node is rewritten as a sequence of non-terminals and terminals (words or word classes) s -> np vp np -> art n vp -> v np Pure Constituent Grammar: CG

12 Classical PSG with phrase labels
(VP) NP ADV | never V | reached N | orbit ART | The ADJ | last N | sattelites PCL V-PCP | launched PP PRP | by NP ART | the PROP | US CG

13 PSG annotation Penn Treebank bracketing: Labeling opening brackets
[NP A minha irmã] [VP não fala [PP com [NP as amigas]]] SUSANNE Treebank bracketing: Labeling all brackets (cf. EAGLES) [NP A minha irmã NP] [VP não fala [PP com [NP as amigas NP] PP] VP] Vertical indented (her with part of speech on one line): [NP [Art A] [Det minha] [N irmã] NP] [VP [Adv não] [V fala] [PP [Prp com] [NP [Art as] [N amigas] NP] PP] VP] CG

14 Adding function: Dependency Grammar with function: adding function (“edge labelse”) to dependency arcs reached S Od fA sattelites never orbit DN DN DN The last launched PASS by DP US DN CG the

15 Constituent Grammar with function:
NEGRA, TIGER: cat labels (mother) vs. edge label (daughter CG

16 Constituent Grammar with function:
VISL (function:form labels for each node) Vertical Notation: STA:cl S:np =DN:art O =H:n governo =DN:prop Cardoso P:v-fin A:pp =H:prp com =DP:np ==DN:art a ==H:n crise Graphical Notation: CG

17 CG

18 CG

19 4. Constraint Grammar (CG)
CG as a descriptive paradigm function-first approach with token-based function tags Classic CG: shallow depedency (attachment direction, head type) depth and constituents only implicitly marked My @>N (pointer to head type: N) fat @>N cat @SUBJ> (direction pointer without head type) never @ADVL> eats @FMV (top node) fish @<ACC CG

20 Adding full numbered dependency
The <artd> ART #1->3 last #2->3 sattelites N P #3->9 launched V PCP2 #4->3 by #5->4 the <artd> ART #6->7 US PROP @P< #7->5 never #8->9 reached V #9->0 orbit N S #10->9 $. #11->0 CG

21 2. Interactive syntactic tree building
CG

22 CG

23 BEFORE / AFTER SECTIONS
BEFORE-SECTIONS AFTER-SECTIONS Run only once (cp. VISLCG's MAPPING / CORRECTIONS) Especially for adding ambiguity, e.g. Syntactic, semantic roles etc. Post-processing errors from previous modules

24 Mapping (MAP, ADD) MAP (@SUBJ) TARGET (N) IF (NOT *-1 NON-PRE-N)
MAP (N) (NOT *-1 NON-PRE-N) Usually as a special section (MAPPING or BEFORE-SECTIONS), but in cg3 allowed anywhere Strictly ordered Both MAP and ADD can be used to add tags, but: MAP "closes" a line for further mapping (but not SUBSTITUTE!) even if the mapped tags do not contain the flagged prefix ADD maps, but allows further mapping MAPed tags can be "seen" by later mapping rules, even in the same section

25 Substitutions SUBSTITUTE (S/P) (P) TARGET (@<SC)
IF + (P) BARRIER CLB) Replaces a tag or tag chain with another, useful for: correcting input from other modules, e.g. stochastic taggers correcting lower level CG once higher lever information is available spell or grammar checkers Usually as a special section (CORRECTIONS or BEFORE-SECTIONS), but in cg3 allowed anywhere 'TARGET' and 'IF' are optional Strictly ordered SUBSTITUTE does not "close" a line for mapping SUBSTITUTEd tags can be "seen" by later SUBSTITUTE or Mapping rules, even in the same section

26 CG Contexts Context conditon: word form “<...>”, base form “....”, tag A-Z, <[a-z]> @[A-Z], combinations ... direction: + (right), - (left) Position marker: 0 self local right: 1, 2, 3 ..., local left: -1, -2, -3, ... Globality * continue until match is found ** continue also across context match to fulfil further (linked) conditions 0* nearest neighbour: search in both directions Careful: C, e.g. *1C (only unambiguous readings)

27 CG contexts 2 NOT: conditions can be negated (NOT *1 VFIN)
contexts can be LINKed (*1C xxx LINK 0 yyy LINK *1 zzz) searches can have a BARRIER (*1 N BARRIER VFIN) contexts can be ANDed IF (0 xxx) (*1 yyy) (NOT *-1 zzz) contexts can be negated as a whole (NEGATE *1 ART LINK 1 ADJ LINK 1 N)

28 CBARRIER (**1 N CBARRIER VFIN) ; like BARRIER, but only if unambiguous
i.e. less strict than BARRIER

29 NEGATE (NEGATE *1 (AUX) LINK 1 (@AUX<)) ;
(NEGATE *-1 N LINK -1 DEF) ; implements aspects of the TEMPLATE idea (being able to refer to - and to negate - chunks of internally linked tokens will invert the result of the entire LINK'ed chain that follows whereas NOT will only invert the result of the immediately following test

30 Unification set names (LIST, SET) can be defined as variables by prefixing $$ in a CG rule, all occurrences of the $$ set will be unified to mean the same set member LIST ROLE = %AG %PAT %TH %LOC ; SELECT $$ROLE (-1 KC) (-2C $$ROLE) ; # (4-in-1 rule) the $$ occurrence in the target position is the primary one (i.e. the one the others unify with) if $$ only is used in contexts, add KEEPORDER to force a safe interpretation of the first occurrence as the primary one: REMOVE KEEPORDER (NEGATE 0 $$CASE LINK -1 N + $$CASE)


Download ppt "Constraint Grammar ESSLLI"

Similar presentations


Ads by Google