Download presentation
Presentation is loading. Please wait.
Published byBenedict Casey Modified over 9 years ago
1
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources1 Advanced Course: Treebank-Based Acquisition of LFG, HPSG and CCG Resources Josef van Genabith, Dublin City University Yusuke Miyao, University of Tokyo Julia Hockenmaier, University of Pennsylvania and University of Edinburgh ESSLLI 2006 18 th European Summer School for Language, Logic and Information, University of Malaga, July – August 2006
2
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources2 Josef van Genabith, National Centre for Language Technology NCLT, School of Computing, Dublin City University, Dublin 9, Ireland, josef@computing.dcu.iejosef@computing.dcu.ie Julia Hockenmaier, juliahr@cis.upenn.edujuliahr@cis.upenn.edu Yusuke Miyao, Department of Computer Science, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113- 0033, JAPAN, yusuke@is.s.u-tokyo.ac.jpyusuke@is.s.u-tokyo.ac.jp Lecturer Contact Information
3
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources3 Motivation What do grammars do? –Grammars define languages as sets of strings –Grammars define what strings are grammatical and what strings are not –Grammars tell us about the syntactic structure of (associated with) strings “Shallow” vs. “Deep” grammars Shallow grammars do all of the above Deep grammars (in addition) relate text to information/meaning representation Information: predicate-argument-adjunct structure, deep dependency relations, logical forms, … In natural languages, linguistic material is not always interpreted locally where you encounter it: long-distance dependencies (LDDs) Resolution of LDDs crucial to construct accurate and complete information/meaning representations. Deep grammars := (text meaning) + (LDD resolution)
4
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources4 Motivation Unification (Constraint-Based) Grammar Formalisms (FU, GPSG, PATR-II, …) –Lexical-Functional Grammar (LFG) –Head-Driven Phrase Structure Grammar (HPSG) –Combinatory Categorial Grammar (CCG) –Tree-Adjoining Grammar (TAG) Traditionally, deep constraint-based grammars are hand-crafted LFG ParGram, HPSG LingoErg, Core Language Engine CLE, Alvey Tools, RASP, ALPINO, … Wide-coverage, deep unification (constraint-based) grammar development is knowledge extensive and expensive! Very hard to scale hand-crafted grammars to unrestricted text! English XLE (Riezler et al. 2002); German XLE (Forst and Rohrer 2006); Japanese XLE (Masuichi and Okuma 2003); RASP (Carroll and Briscoe 2002); ALPINO (Bouma, van Noord and Malouf, 2000)
5
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources5 Motivation Instance of “knowledge acquisition bottleneck” familiar from classical “rationalist” rule/knowledge-based AI/NLP Alternative to classical “rationalist” rule/knowledge-based AI/NLP “Empiricist” research paradigm (AI/NLP): –Corpora, treebanks, …, machine-learning-based and statistical approaches, … –Treebank-based grammar acquisition, probabilistic parsing –Advantage: grammars can be induced (learned) automatically –Very low development cost, wide-coverage, robust, but … Most treebank-based grammar induction/parsing technology produces “shallow” grammars Shallow grammars don’t resolve LDDs (but see (Johnson 2002); …), do not map strings to information/meaning representations …
6
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources6 Motivation Poses a research question: Can we address the knowledge acquisition bottleneck for deep grammar development by combining insights from rationalist and empiricist research paradigms? Specifically: Can we automatically acquire wide-coverage “deep”, probabilistic, constraint-based grammars from treebanks? How do we use them in parsing? Can we use them for generation? Can we acquire resources for different languages and treebank encodings? How do these resources compare with hand-crafted resources? …
7
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources7 Course Overview Monday: Tuesday: Wednesday: Thursday: Friday: Motivation, Course Overview, Introductions to TAG, LFG, CCG, HPSG and Penn-II TreeBank, TAG Resources Penn-II-Based Acquisition of LFG Resources Penn-II-Based Acquisition of CCG Resources Penn-II-Based Acquisition of HPSG Resources Multilingual Resources, Formal Semantics, Comparing LFG, CCG, HPSG and TAG-Based Approaches, Demos, Current and Future Work, Discussion
8
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources8 Course Overview Tuesday/Wednesday/Thursday Penn-II-Based Acquisition of XXG Resources: Treebank Preprocessing/Clean-Up Treebank Annotation/Conversion Grammar and Lexicon Extraction Parsing (Architectures, Probability Models, Evaluation) Generation (Architectures, Probability Models, Evaluation) Other (Sematics, Domain Variation, …)
9
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources9 Grammar Formalisms
10
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources10 Grammar formalisms and linguistic theories Linguistics aims to explain natural language: –What is universal grammar? –What are language-specific constraints? Formalisms are mathematical theories: –They provide a language in which linguistic theories can be expressed (like calculus for physics) –They define elementary objects (trees, strings, feature structures) and recursive operations which generate complex objects from simple objects. –They do impose linguistic constraints (e.g. on the kinds of dependencies they can capture)
11
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources11 Lexicalised Grammar Formalisms: TAG, CCG, LFG and HPSG
12
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources12 Lexicalised formalisms (TAG, CCG, LFG and HPSG) The lexicon: –pairs words with elementary objects –specifies all language-specific information (number and location of arguments, control and binding theory) The grammatical operations: –are universal –define (and impose constraints on) recursion
13
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources13 TAG, CCG, LFG and HPSG They describe different kinds of linguistic objects: –TAG is a theory of trees –CCG is a theory of (syntactic and semantic) types –LFG is a multi-level theory based on a projection architecture relating different types of linguistic objects (trees, AVMs, linear logic–based semantics) –HPSG uses single, uniform formalism (typed feature structures) to describe phonological, morphological, syntactic and semantic representations (signs) They differ in details: –treatment of wh-movement, coordination, etc.
14
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources14 TAG, CCG, LFG and HPSG TAG and CCG are weakly equivalent. Both are mildly context-sensitive: –can capture Dutch crossing dependencies –but are still efficiently parseable (in polynomial time) LFG context-sensitive
15
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources15 Tree-Adjoining Grammar (TAG) Tree-Adjoining Grammar
16
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources16 (Lexicalized) Tree-Adjoining Grammar TAG is a tree-rewriting formalism: –TAG defines operations (substitution and adjunction) on trees. –The elementary objects in TAG are trees (not strings) TAG is lexicalized: –Each elementary tree is anchored to a lexical item (word) –“Extended domain of locality”: The elementary tree contains all arguments of the anchor. –TAG requires a linguistic theory which specifies the shape of these elementary trees. TAG is mildly context-sensitive: –can capture Dutch crossing dependencies –but is still efficiently parseable AK Joshi and Y Schabes (1996) Tree Adjoining Grammars. In G. Rosenberg and A. Salomaa, Eds., Handbook of Formal Languages
17
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources17 TAG substitution (arguments) Substitute XY XX YY X Y Derivation tree: Derived tree:
18
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources18 ADJOIN TAG adjunction (modifiers) X X* X X Auxiliary tree Foot node Derived tree: Derivation tree:
19
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources19 A small TAG lexicon S NPVP VBZ NP eats NP John VP RB VP* always NP tapas
20
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources20 A TAG derivation S NPVP VBZ NP eats NP John NP tapas VP RB VP* always NP
21
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources21 A TAG derivation S NPVP VBZ NP eats tapas VP RB VP* always John VP
22
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources22 A TAG derivation S NP VBZ VP NP eatstapas VP RB VP* always John
23
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources23 Combinatory Categorial Grammar (CCG) Combinatory Categorial Grammar
24
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources24 Combinatory Categorial Grammar CCG is a lexicalized grammar formalism (the “rules” of the grammar are completely general, all language-specific information is given in the lexicon) CCG is nearly context-free (can capture Dutch crossing dependencies, but is still efficiently parseable) CCG has a flexible constituent structure CCG has a simple, unified treatment of extraction and coordination CCG has a transparent syntax-semantics interface (every syntactic category and operation has a semantic counterpart) CCG rules are monotonic (movement or traces don’t exist) CCG rules are type-driven, not structure-driven (this means e.g. that intransitive verbs and VPs are indistinguishable)
25
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources25 Categories: specify subcat lists of words/constituents. Combinatory rules: specify how constituents can combine. The lexicon: specifies which categories a word can have. Derivations: spell out process of combining constituents. CCG: the machinery
26
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources26 CCG categories Simple categories: NP, S, PP Complex categories: functions which return a result when combined with an argument: VP or intransitive verb:S\NP Transitive verb: (S\NP)/NP Adverb:(S\NP)\(S\NP) PPs:((S\NP)\(S\NP))/NP (NP\NP)/NP Every category has a semantic interpretation
27
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources27 Function application Combines a function with its argument to yield a result: (S\NP)/NP NP -> S\NP eats tapas eats tapas NP S\NP-> S John eats tapasJohn eats tapas Used in all variants of categorial grammar
28
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources28 A (C)CG derivation
29
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources29 Type-raising and function composition Type-raising: turns an argument into a function. Corresponds to case: NP -> S/(S\NP) (nominative) NP -> (S\NP)/((S\NP)/NP) (accusative) Function composition: composes two functions (complex categories) (S\NP)/PP PP/NP -> (S\NP)/NP S/(S\NP) (S\NP)/NP -> S/NP
30
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources30 Type-raising and Composition Wh-movement: Right-node raising:
31
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources31 Another CCG derivation We will only be concerned with canonical “normal-form” derivations, which only use function composition and type-raising when syntactically necessary.
32
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources32 CCG: semantics Every syntactic category and rule has a semantic counterpart:
33
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources33 The CCG lexicon Pairs words with their syntactic categories (and semantic interpretation): eats (S\NP)/NP x y.eats’xy S\NP x.eats’x The main bottleneck for wide-coverage CCG parsing
34
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources34 Why use CCG for statistical parsing? CCG derivations are binary trees: we can use standard chart parsing techniques. CCG derivations represent long-range dependencies and complement-adjunct distinctions directly:
35
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources35 A comparison with Penn Treebank parsers Standard Treebank parsers do not recover the null elements and function tags that are necessary for semantic interpretation:
36
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources36 Lexical-Functional Grammar (LFG) Lexical-Functional Grammar
37
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources37 Lexical-Functional Grammar LFG Lexical-Functional Grammar (LFG) (Bresnan & Kaplan 1981, Bresnan 2001, Dalrymple 2001) is a unification- (or constraint-) based theory of grammar. Two (basic) levels of representation: C-structure: represents surface grammatical configurations such as word order, annotated CFG data structures F-structure: represents abstract syntactic functions such as SUBJ(ject), OBJ(ect), OBL(ique), PRED(icate), COMP(lement), ADJ(unct) …, AVM attribute-value matrices/structures F-structure approximates to basic predicate-argument structure, dependency representation, logical form (van Genabith and Crouch, 1996; 1997)
38
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources38 Lexical-Functional Grammar LFG
39
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources39 Lexical-Functional Grammar LFG
40
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources40 Lexical-Functional Grammar LFG
41
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources41 LFG Grammar Rules and Lexical Entries
42
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources42 LFG Parse Tree (with Equations/Constraints)
43
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources43 LFG Constraint Resolution (1/3)
44
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources44 LFG Constraint Resolution (2/3)
45
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources45 LFG Constraint Resolution (3/3)
46
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources46 LFG Subcategorisation & Long Distance Dependencies Subcategorisation: –Semantic forms (subcat frames): sign –Completeness: all GFs in semantic form present at local f-structure –Coherence: only the GFs in semantic form present at local f- structure Long Distance Dependencies (LDDs): resolved at f-structure with Functional Uncertainty Equations (regular expressions specifying paths in f-structure).
47
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources47 LFG LDDs: Complement Relative Clause
48
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources48 LFG LDDs: Complement Relative Clause
49
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources49 LFG LDDs: Complement Relative Clause
50
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources50 Head-Driven Phrase Structure Grammar (HPSG) Head-Driven Phrase Structure Grammar
51
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources51 Head-Driven Phrase Structure Grammar HPSG HPSG (Pollard and Sag 1994, Sag et al. 2003) is a unification-/constraint-based theory of grammar HPSG is a lexicalized grammar formalism HPSG aims to explain generic regularities that underlie phrase structures, lexicons, and semantics, as well as language-specific/-independent constraints Syntactic/semantic constraints are uniformly denoted by signs, which are represented with feature structures Two components of HPSG –Lexical entries represent word-specific constraints (corresponding to elementary objects) –Principles express generic grammatical regularities (corresponding to grammatical operations)
52
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources52 Sign Sign is a formal representation of combinations of phonological forms, syntactic and semantic constraints sign PHON string SYNSEM LOCAL NONLOCAL CAT CONT content HEAD VAL valence SPR list SUBJ list COMPS list head MOD synsem synsem local category nonlocal QUE list REL list SLASH list phonological form syntactic/semantic constraints local constraints syntactic category syntactic head modifying constraints subcategorization frames semantic representations non-local dependencies DTRS dtrs daughter structures
53
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources53 Lexical entries Lexical entries express word-specific constraints PHON “loves” HEAD verb SUBJ COMPS We use simplified notations in this lecture
54
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources54 Principles Principles describe generic regularities of grammar –Not corresponding to construction rules Head Feature Principle –The value of HEAD must be percolated from the head daughter Valence Principle –Subcats not consumed are percolated to the mother Immediate Dominance (ID) Principle –A mother and her immediate daughters must satisfy one of ID schemas Many other principles: percolation of NONLOCAL features, semantics construction, etc. HEAD 1 1 head daughter
55
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources55 ID schemas ID schemas correspond to construction rules in CFGs and other grammar formalisms –For subject-head constructions (ex. “John runs” ) –For head-complement constructions (ex. “loves Mary” ) –For filler-head constructions (ex. “what he bought” ) COMPS 12 1 SUBJ <> 1 SUBJ 1 COMPS 2 SLASH 121 2
56
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources56 Example: HPSG parsing Lexical entries determine syntactic/semantic constraints of words HEAD noun SUBJ <> COMPS <> John Mary HEAD verb SUBJ COMPS HEAD noun SUBJ <> COMPS <> saw Lexical entries
57
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources57 Example: HPSG parsing Principles determine generic constraints of grammar HEAD noun SUBJ <> COMPS <> John Mary HEAD verb SUBJ COMPS HEAD noun SUBJ <> COMPS <> saw HEAD SUBJ COMPS 2 34 1 3 1 2 4 Unification
58
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources58 Example: HPSG parsing Principle application produces phrasal signs HEAD noun SUBJ <> COMPS <> John Mary HEAD verb SUBJ COMPS HEAD noun SUBJ <> COMPS <> saw HEAD verb SUBJ COMPS <>
59
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources59 Example: HPSG parsing Recursive applications of principles produce syntactic/semantic structures of sentences HEAD noun SUBJ <> COMPS <> John Mary HEAD verb SUBJ COMPS HEAD noun SUBJ <> COMPS <> saw HEAD verb SUBJ COMPS <> HEAD verb SUBJ <> COMPS <>
60
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources60 Example: LDDs NONLOCAL features (SLASH, REL, etc.) explain long-distance dependencies –WH movements –Topicalization –Relative clauses etc...
61
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources61 Brief Intro to Penn Treebank
62
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources62 The Penn Treebank The first large syntactically annotated corpus Contains text from different domains: –Wall Street Journal (50,000 sentences, 1 Million words) –Switchboard –Brown corpus –ATIS The annotation: –POS-tagged (Ratnaparkhi’s MXPOST) –Manually annotated with phrase-structure trees –Traces and other null elements used to represent non-local dependencies (movement, PRO, etc.) –Designed to facilitate extraction of predicate-argument structure
63
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources63 A Treebank tree Relatively flat structures: –There is no noun level –VP arguments and adjuncts appear at the same level Co-indexed null elements indicate long-range dependencies Function tags indicate complement-adjunct distinction (?)
64
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources64 Penn-II Treebank
65
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources65 Penn-II Treebank Until Congress acts, the government hasn't any authority to issue new debt obligations of any kind, the Treasury said.
66
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources66 Penn-II Treebank
67
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources67 Penn-II Treebank
68
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources68 Penn-II Treebank
69
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources69 Penn-II Treebank
70
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources70 Penn-II Treebank (Simple Transitive Verb)
71
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources71 Penn-II Treebank (Simple Coordination)
72
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources72 Penn-II Treebank (Passive)
73
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources73 Penn-II Treebank (Subject WH-Relative Clause)
74
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources74 Penn-II Treebank (WH-Less Complement Relative Cl.)
75
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources75 Penn-II Treebank (Control and WH-Compl. Rel. Cl.)
76
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources76 Penn-II Treebank (Adv. Relative Clause)
77
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources77 Penn-II Treebank (Coord. and Right Node Raising)
78
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources78 The Parseval measure Standard evaluation metric for Treebank parsers. Two components: –Precision: how many of the proposed NTs are correct? –Recall: how many of the correct NTs are proposed? Measures recovery of nonterminals (span + syntactic category) Ignores function tags and null elements Has biased research towards parsers that produce linguistically shallow output (Collins, Charniak)
79
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources79 Treebank-Based Acquisition of TAG resources
80
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources80 Extracting a TAG from the Treebank Two different approaches: –F. Xia. Automatic Grammar Generation From Two Different Perspectives. PhD thesis, University of Pennsylvania, 2001. –J. Chen, S. Bangalore, K. Vijaj-Shanker. Automated Extraction of Tree-Adjoining Grammars from Treebanks, Natural Language Engineering (forthcoming) This lecture: just the basic ideas!
81
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources81 Extracting a TAG from the Penn Treebank Input: a Treebank tree (= the TAG derived tree) Output: a set of elementary trees (= the TAG lexicon)
82
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources82 Extracting a TAG: the head -Identify the head path (requires a head percolation table) S VPVP VBG making VPVP - Find the arguments of the head (requires an argument table) - Ignore modifiers (requires an adjunct table) - Merge unary productions (VP -> VP) NP-SBJ NP
83
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources83 Extracting a TAG: the head This is the elementary tree for the head:
84
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources84 Extracting a TAG: arguments Arguments are combined via substitution Recurse on the arguments:
85
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources85 Extracting a TAG: adjuncts Adjuncts require auxiliary trees (use adjunction to be combined with the head) Auxiliary trees require a foot node (with the same label as the root) is VBZ VP ADVP-MNR officially NP DT the
86
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources86 Extracting a TAG: adjuncts Adjuncts require auxiliary trees (use adjunction to be combined with the head) Auxiliary trees require a foot node (with the same label as the root)
87
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources87 Special cases Coordination Null elements (e.g. traces for wh-movement): The trace has to be part of the elementary tree of the main verb Punctuation marks
88
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources88 Wh-movement: relative clauses (NP (NP a charge)) (SBAR (WHNP-2 (-NONE- 0)) (S (NP-SBJ Mr. Coleman)) (VP (VBZ denies) (NP (-NONE- *T*-2))))))) NP SBAR NP S VP VBZ WHNP -NONE- *T*-2 0 denies
89
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources89 Evaluating an extracted grammar/lexicon Grammar/lexicon size? –Depends on head table, argument/adjunct distinction, treatment of null elements, mapping of Treebank labels/POS tags to categories in extracted grammar etc. –For TAGs, between 3,000-8,500 elementary tree types, and 100,000-130,000 lexical entries. Lexical coverage? –For TAGs, around 92-93% Distribution of tree types? Convergence? Quality? –Inspection, comparison with manual grammar
90
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources90 References: TAG extraction TAG: A.K. Joshi and Y. Schabes (1996) Tree Adjoining Grammars. In G. Rosenberg and A. Salomaa, Eds., Handbook of Formal Languages TAG extraction: F. Xia. Automatic Grammar Generation From Two Different Perspectives. PhD thesis, University of Pennsylvania, 2001. J. Chen, S. Bangalore, K. Vijaj-Shanker. Automated Extraction of Tree-Adjoining Grammars from Treebanks, Natural Language Engineering (forthcoming) Also: L. Shen and A.K. Joshi, Building an LTAG Treebank, Technical Report MS-CIS-05-15, CIS Department, University of Pennsylvania, 2005 Parsing with extracted TAGs: D. Chiang. Statistical parsing with an automatically extracted tree adjoining grammar. In Data Oriented Parsing, CSLI Publications, pages 299–316. L. Shen and A.K. Joshi. Incremental LTAG parsing, HLT/EMNLP 2005
91
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources91 Penn-II-Based Acquisition of LFG Resources Lexical-Functional Grammar
92
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources92 Penn-II-Based Acquisition of LFG Resources Introduction Treebank Preprocessing/Clean-Up Treebank Annotation/Conversion Grammar and Lexicon Extraction Parsing (Architectures, Probability Models, Evaluation) Generation (Architectures, Probability Models, Evaluation) Other (Semantics, Domain Variation, … )
93
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources93 Introduction: Penn-II & LFG If we had f-structure annotated version of Penn-II, we could use (standard) machine learning methods to extract probabilistic, wide- coverage LFG resources How do we get f-structure annotated Penn-II? Manually? No: 50,000 trees … ! Automatically! Yes: F-Structure annotation algorithm … ! Penn-II is a 2 nd generation treebank – contains lots of annotations to support derivation of deep meaning representations: trees, Penn-II “ functional ” tags, traces & coindexation – f-structure annotation algorithm can exploit those.
94
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources94 Introduction: Penn-II & LFG What is the task? Given a Penn-II tree, the f-structure annotation algorithm has to traverse the tree and associate all tree nodes with f-structure equations (including lexical equations at the leaves of the tree). A simple example
95
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources95 Introduction: Penn-II & LFG S NP-SBJVP NNNNS Factorypayrolls VBDPP-TMP fell INNP NNPin ↑=↓↑=↓ ↑ subj= ↓ ↑=↓↑=↓ ↑=↓↑=↓ ↓ ↑ adjunct ↑=↓↑=↓ ↑=↓↑=↓ ↑ obj= ↓ ↑=↓↑=↓ September Factory payrolls fell in September.
96
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources96 Introduction: Penn-II & LFG subj : pred : payroll num : pl pers : 3 adjunct : 2 : pred : factory num : sg pers : 3 adjunct : 1 : pred : in obj : pred : september num : sg pers : 3 pred : fall tense : past
97
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources97 Treebank Preprocessing/Clean-Up: Penn-II & LFG Penn-II treebank: often flat analyses (coordination, NPs …), a certain amount of noise: inconsistent annotations, errors … No treebank preprocessing or clean-up in the LFG approach Take Penn-II treebank as is, but Remove all trees with FRAG or X labelled constituents Frag = fragments, X = not known how to annotate Total of 48,424 trees as they are.
98
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources98 Treebank Annotation: Penn-II & LFG Annotation-based (rather than conversion-based) Automatic annotation of nodes in Penn-II treebank tress with f- structure equations F-structure Annotation Algorithm Annotation Algorithm exploits: –Head information –Categorial information –Configurational information –Penn-II functional tags –Trace information
99
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources99 Treebank Annotation: Penn-II & LFG Architecture of a modular algorithm to assign LFG f-structure equations to trees in the Penn-II treebank: Left-Right Context Annotation Principles Coordination Annotation Principles Catch-All and Clean-Up Traces Proto F-Structures Proper F-Structures Head-Lexicalisation [Magerman,1994]
100
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources100 Treebank Annotation: Penn-II & LFG Head Lexicalisation: modified rules based on (Magerman, 1994)
101
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources101 Treebank Annotation: Penn-II & LFG Left-Right Context Annotation Principles: Head of NP likely to be rightmost noun … Mother → Left Context Head Right Context Left Context Right Context Head
102
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources102 Treebank Annotation: Penn-II & LFG Left ContextHeadRight Context DT: ↑ spec:det= ↓ QP: ↑ spec:quant= ↓ JJ, ADJP: ↓ ↑ adjunct NN, NNS: ↑ = ↓ NP: ↓ ↑ app PP: ↓ ↑ adjunct S, SBAR: ↓ ↑ relmod NP DT RB ADJP very politicized NN JJdeala NP ↑ spec:det= ↓ DT RB ↓ ↑ adjunct ADJP very politicized ↑ = ↓ NN JJdeala → NP: Left-Right Annotation Matrix
103
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources103 Treebank Annotation: Penn-II & LFG
104
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources104 Treebank Annotation: Penn-II & LFG Do annotation matrix for each of the monadic categories (without –Fun tags) in Penn-II Based on analysing the most frequent rule types for each category such that sum total of token frequencies of these rule types is greater than 85% of total number of rule tokens for that category 100% 85% 100% 85% NP 6595 102 VP 10239 307 S 2602 20 ADVP 234 6 Apply annotation matrix to all (i.e. also unseen) rules/sub-trees, i.e. also those NP-LOC, NP-TMP etc.
105
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources105 Treebank Annotation: Penn-II & LFG Co-ordination Annotation Principles Often flat Penn-II analysis of coordination: Co-ordinated Element Object Modifier
106
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources106 Treebank Annotation: Penn-II & LFG Unlike constituents coordination: Co-ordinated Element
107
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources107 Treebank Annotation: Penn-II & LFG Traces Module: Long Distance Dependencies Topicalisation Wh- and wh-less questions Relative clauses Passivisation Control constructions ICH (interpret constituent here) RNR (right node raising) … Translate Penn-II traces and coindexation into corresponding reentrancy in f-structure
108
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources108 Treebank Annotation: WH-Relative Clauses
109
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources109 Treebank Annotation: Wh-Less Relative Clauses
110
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources110 Treebank Annotation: Control & Wh-Rel. LDD
111
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources111 Treebank Annotation: Adv. Relative Clause
112
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources112 Treebank Annotation: Right Node Raising
113
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources113 Treebank Annotation: Right Node Raising
114
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources114 Treebank Annotation: Penn-II & LFG Catch-All and Clean-Up Module: Penn-II Functional Tags are used to identify potential errors –e.g. Nodes with the tag -SBJ should be annotated as the subject … Correction of Overgeneralisations –e.g. Change a second OBJ annotations to OBJ2 … –e.g. Change arguments of head nouns erroneously annotated as relative clauses to COMP arguments: … signs [that managers expect declines]_RELCL … … signs [that managers expect declines]_COMP … Unannotated Nodes –Defaults …
115
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources115 Treebank Annotation: Penn-II & LFG Left-Right Context Annotation Principles Coordination Annotation Principles Catch-All and Clean-Up Traces Proto F-Structures Proper F-Structures Head-Lexicalisation [Magerman,1995] Constraint Solver
116
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources116 Treebank Annotation: Penn-II & LFG Collect f-structure equations Send to constraint solver Generates f-structures F-structure annotation algorithm implemented in Java, constraint solver in Prolog ~3 min annotating approx. 50,000 Penn-II trees ~5 min producing approx. 50,000 f-structures
117
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources117 Treebank Annotation: Penn-II & LFG
118
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources118 Treebank Annotation: Penn-II & LFG
119
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources119 Evaluation (Quantitative): Burke (2006) Coverage: Over 99.8% of Penn-II sentences (without X and FRAG constituents) receive a single covering and connected f-structure: 0 F-structures 45 0.093% 1 F-structure4832999.804% 2 F-structures 50 0.103% Treebank Annotation: Penn-II & LFG
120
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources120 Evaluation (Qualitative): Burke (2006) F-structure quality evaluation against DCU 105, a manually annotated dependency gold standard of 105 sentences randomly extracted from WSJ section 23. Triples are extracted from the gold standard and the automatically produced f-structures using the evaluation software from (Crouch et al. 2002) and (Riezler et al. 2002) relation(predicate~0, argument~1) Results calculated in terms of Precision and Recall Treebank Annotation: Penn-II & LFG
121
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources121 Treebank Annotation: Penn-II & LFG Precision and Recall for DCU 105 Dependency Bank results are calculated for All Annotations and for Preds-Only DCU 105All AnnotationsPreds-Only Precision 97.06% 94.28% Recall 96.80% 94.28%
122
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources122 Treebank Annotation: Penn-II & LFG DCU 105 FeaturePrecisionRecallF-Score adjunct 892/968 = 92 892/950 = 94 93 app 16/16 = 100 16/19 = 84 91 comp 88/92 = 96 88/102 = 86 91 coord 153/184 = 83 153/167 = 92 87 obj 442/459 = 96 442/461 = 96 96 obl 50/52 = 96 50/61 = 82 88 oblag 12/12 = 100 12/12 = 100 100 passive 76/79 = 96 76/80 = 95 96 poss 74/79 = 94 74/81 = 91 92 quant 40/64 = 62 40/52 = 77 69 relmod 46/48 = 96 46/50 = 92 94 subj 396/412 = 96 396/414 = 96 96 topic 13/13 = 100 13/13 = 100 100 topicrel 46/49 = 94 46/52 = 88 91 xcomp 145/153 = 95 145/146 = 99 97
123
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources123 Treebank Annotation: Penn-II & LFG Following (Kaplan et al. 2004) Precision and Recall for PARC 700 Dependency Bank calculated for: all annotations PARC features preds-only Mapping required (Burke 2006) PARC 700PARC features Precision 88.31% Recall 86.38%
124
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources124 Grammar and Lexicon Extraction : Penn-II & LFG Lexical Resources: Lexical information extremely important in modern lexicalised grammar formalisms LFG, HPSG, CCG, TAG, … Lexicon development is time consuming and extremely expensive Rarely if ever complete Familiar knowledge acquisition bottleneck … Subcategorisation frame induction (LFG semantic forms) from f- Structure annotated version of Penn-II and -III Evaluation against COMLEX
125
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources125 Grammar and Lexicon Extraction: Penn-II & LFG Lexicon Construction –Manual vs. Automated Our Approach: – F-Structure Annotation of Penn-II and Penn-III – Frames not Predefined – Functional and Categorial Information – Parameterised for Prepositions and Particles – Active and Passive – Long Distance Dependencies – Conditional Probabilities
126
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources126 Grammar and Lexicon Extraction: Penn-II & LFG Extraction Methodology –Automatic F-Structure Annotation of Penn-II & III –Lexical Extraction Algorithm –Examples Evaluation –Gold Standards (COMLEX, OALD) –Experimental Architecture –Results
127
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources127 Grammar and Lexicon Extraction: Penn-II & LFG sign
128
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources128 Grammar and Lexicon Extraction: Penn-II & LFG Semantic Forms: PRED Governable Grammatical Functions (Arguments) –SUBJ, OBJ, OBJ θ, OBL, OBL θ, COMP, XCOMP, PART… Non-Governable Grammatical Functions (Adjuncts) –ADJ, XADJ, APP, RELMOD, …
129
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources129 Grammar and Lexicon Extraction: Penn-II & LFG Penn-II Treebank Automatic F-Structure Annotation Algorithm LFG F-Structures Extraction Algorithm Semantic Forms
130
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources130 Grammar and Lexicon Extraction: Penn-II & LFG Extraction Algorithm: For each f-structure F For each level of embedding in F Determine the local predicate PRED Collect all subcategorisable grammatical functions GF 1, …, GF n Return: PRED
131
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources131 Grammar and Lexicon Extraction: Penn-II & LFG subj : spec : det : pred : the pred : inquiry num : sg pers : 3 adjunct : 1 : pred : soon pred : focus tense : past obl : pform : on obj : spec : det : pred : the pred : judge num : sg pers : 3 “The inquiry soon focused on the judge” (wsj_0267_72) Prepositions and OBLs: focus([subj,obl:on]) on([obj])
132
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources132 Grammar and Lexicon Extraction: Penn-II & LFG topic : index : [1] subj : spec : det : pred : the num : sing pred : government pers : 3 … pred : have tense : pres subj : spec : det : pred : the pers : 3 pred : treasury num : sing comp : index : [1] subj : spec : det : pred : the num : sing pred : government pers : 3 … pred : have tense : pres pred : say tense : past LDDs: say([subj,comp]) “Until Congress acts, the government hasn't any authority to issue new debt obligations of any kind, the Treasury said.” (wsj_0008_2)
133
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources133 Grammar and Lexicon Extraction: Penn-II & LFG subj : pred : pro pron_form : it passive : + to_inf : + pred : be xcomp : subj : pred : pro pron_form : it passive : + pred : consider tense : past obl : pform : as obj : spec : det : pred : a ……… pred : risk num : sg pers : 3 Passive: consider([subj,obl:as],p) “… to be considered as an additional risk for the investor…”(wsj_0018_14)
134
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources134 Grammar and Lexicon Extraction: Penn-II & LFG subj : spec : det : pred : the cat : dt pred : inquiry num : sg pers : 3 cat : nn adjunct : 1 : pred : soon cat : rb pred : focus tense : past cat : vbd obl : pform : on obj : spec : det : pred : the cat : dt pred : judge num : sg pers : 3 cat : nn CFG categories: focus(v,[subj,obl:on]) focus(v,[subj(n),obl:on]) “The inquiry soon focused on the judge.” (wsj_0267_72)
135
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources135 Grammar and Lexicon Extraction: Penn-II & LFG Lexicon extracted from Penn-II (O’Donovan et al 2005):
136
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources136 Grammar and Lexicon Extraction: Penn-II & LFG Evaluation for all active verbs (2992) extracted from Penn-II against COMLEX Largest evaluation for English subcat frame extraction system Carroll and Rooth (1998) – 200 verbs Schulte im Walde (2000) – over 3000 German verbs (VERB:ORTH “reimburse”:SUBC((NP-NP) (NP-PP :PVAL (“for”)) (NP))) (vp-framenp-np:cs((np 2)(np 3)) :gs(:subject 1 :obj 2 :obj2 3) :ex“she asked him his name”)
137
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources137 Grammar and Lexicon Extraction: Penn-II & LFG Following Schulte im Walde (2000): Experiment 1: Exclude prepositional phrases entirely (e.g. [subj,obl:on] is [subj]) Experiment 2: Include prepositional phrase but not specific preposition (e.g. [subj,obl]). –2a (+ Part value) Experiment 3: Include details of specific preposition (e.g. [subj,obl:on]) –3a (+ Part value) Relative Thresholds of 1% and 5%
138
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources138 Grammar and Lexicon Extraction: Penn-II & LFG
139
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources139 Grammar and Lexicon Extraction: Penn-II & LFG Directional Prepositions (about, across, along, around, behind, below, beneath, between, beyond, by, down, from…) included in COMLEX by “default” for verbs that have at least one p-dir … (VERB :ORTH "cycle" :SUBC ((PP :PVAL ("p-dir")))
140
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources140 Grammar and Lexicon Extraction: Penn-II & LFG Penn-III = Penn-II + the parsed section of the Brown Corpus –About 300,000 of a total of 1 Million Words Brown Corpus –Balanced Corpus (8 genres) e.g. Humour, Science Fiction etc. Subcategorisation variation across domains More data, more verbs -CLR tag (closely related)
141
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources141 Grammar and Lexicon Extraction: Penn-II & LFG
142
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources142 Grammar and Lexicon Extraction: Penn-II & LFG
143
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources143 Grammar and Lexicon Extraction: Penn-II & LFG
144
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources144 Grammar and Lexicon Extraction: Penn-II & LFG Applications: Porting to other languages –German (TIGER) –Spanish (CAST3LB ) –Chinese (CTB-I and II) LDD resolution in parsing new text (Cahill et al., 2004)
145
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources145 Grammar and Lexicon Extraction: Penn-II & LFG Parsing-Based Subcat Frame Extraction (O’Donovan 2006): Treebank-based vs. parsing-based subcat frame extraction We parsed British National Corpus BNC (100 million words) with our automatically induced LFGs 19 days on single machine: ~5 million words per day Subcat frame extraction for ~10,000 verb lemmas Evaluation against COMLEX and OALD Evaluation against Korhonen (2002) gold standard Our method is statistically significantly better
146
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources146 Parsing: Penn-II and LFG Overview Parsing Architectures: Pipeline & Integrated Long-Distance Dependency Resolution at F-Structure Evaluation
147
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources147 Parsing: Penn-II and LFG
148
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources148 Parsing: Penn-II and LFG PCFG consists of CFG rules with associated probabilities A-PCFG treats strings consisting of CFG categories followed by 1 or more functional annotation(s) as monadic categories (e.g. NP[up- obj=down] ) Probabilistic parsing technology (PCFGs, History-Based and Lexicalised Parsers) produces trees without LDDs Exceptions: (Collins 1999): wh-relclauses; (Johnson 2002) post- processing; … In our (standard) architecture new text is parsed into proto f-structures. LDD resolution at f-structure
149
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources149 Parsing: Penn-II and LFG Penn-II tree with traces and co-indexation for LDDs “U.N. signs treaty, the paper said” S S-1 NPVP NP VP DTNN VBDS NNP VBZNP -NONE- NN *T*-1 U.N.signs treaty the papersaid
150
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources150 Parsing: Penn-II and LFG Trace and coindexaction in tree translated into reentrancy at f-structure by annotation algorithm: “U.N. signs treaty, the headline said”
151
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources151 Parsing: Penn-II and LFG Parse tree from PCFG and History-Based Parsers without traces: “U.N. signs treaty, the paper said” S S NPVP NP VP DTNN VBD NNP VBZNP NNU.N.signs treaty the papersaid
152
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources152 Parsing: Penn-II and LFG Basic, but possibly incomplete, predicate-argument structures (proto-f- structures): “U.N. signs treaty, the headline said”
153
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources153 Parsing: Penn-II and LFG Require: –subcategorisation frames (O’Donovan et al., 2004, 2005; O’Donovan 2006) –functional uncertainty equations Previous Example: –say([subj,comp]) – topic = comp*comp (search along a path of 0 or more comps)
154
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources154 Parsing: Penn-II and LFG Subcat Frames: Automatically acquired from automatically f-structure-annotated Penn-II Treebank following (O’Donovan et al. 2004, 2005; O’Donovan 2006) Distinction between active and passive frames Associated with probabilities O’Donovan et al. evaluate against COMLEX resource Extracted from sections 02-21 10960 active lemma-frame types (semantic forms/subcat frames), 2241 passive types
155
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources155 Parsing: Penn-II and LFG Functional Uncertainty equations: Automatically acquire finite approximations of FU-equations Extract paths between co-indexed material in automatically generated f- structures from sections 02-21 from Penn-II 26 TOPIC, 60 TOPICREL, 13 FOCUS path types 99.69% coverage of paths in section 23 Each path type associated with a probability
156
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources156 Parsing: Penn-II and LFG Sample TOPICREL paths with frequencies: up-subj7894 up-obj1167 up-xcomp 956 up-xcomp:obj 793 up-xcomp:xcomp161 up-xcomp:xcomp:obj135 up-comp:subj119 up-xcomp:subj 92 Sample TOPIC paths with probabilities: up-topic=up-comp0.940 up-topic=up-xcomp:comp0.006 up-topic=up-comp:comp0.001
157
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources157 Parsing: Penn-II and LFG LDD Resolution Algorithm: recursively traverse an f-structure and –find TOPIC:T attribute-value pair –retrieve TOPIC paths –for each path p of the form GF 1 :…: GF n :GF, traverse the f-structure along the TOPIC path GF 1 :…: GF n to local sub f-structure g at g retrieve local PRED:P add GF:T to g iff –GF is not present at g –g together with GF is locally complete and coherent with respect to a semantic form s for P –multiply path and semantic form probabilities involved to rank resolution
158
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources158 Parsing: Penn-II and LFG
159
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources159 Subcategorisation Frames say([subj])0.06 say([comp,subj])0.87 say([subj,xcomp])0.02... Subcategorisation Frames say([subj])0.06 Subcategorisation Frames say([subj])0.06 say([comp,subj])0.87 topic : pred : sign subj : pred : U.N. obj : pred : treaty pred : say subj : spec : the pred : paper Parsing: Penn-II and LFG comp : pred : sign subj : pred : U.N. obj : pred : treaty FU-path approximations up-topic=up-comp0.940 up-topic=up-xcomp:comp0.006 up-topic=up-comp:comp0.001... topic pred : say 0.940 0.87 FU-path approximations up-topic=up-comp
160
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources160 Parsing: Penn-II and LFG How do treebank-based constraint grammars compare to deep hand- crafted grammars like XLE and RASP? XLE (Riezler et al. 2002, Kaplan et al. 2004) –hand-crafted, wide-coverage, deep, state-of-the-art English LFG and XLE parsing system with log-linear-based probability models for disambiguation –PARC 700 Dependency Bank gold standard (King et al. 2003), Penn-II Section 23-based RASP (Carroll and Briscoe 2002) –hand-crafted, wide-coverage, deep, state-of-the-art English probabilistic unification grammar and parsing system (RASP Rapid Accurate Statistical Parsing) –CBS 500 Dependency Bank gold standard (Carroll, Briscoe and Sanfillippo 1999), Susanne-based
161
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources161 Parsing: Penn-II and LFG
162
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources162 Choose best treebank-based LFG system to compare with XLE/RASP: C-structure engines (state-of-the-art history based, lexicalised parsers): –(Collins 1999) –(Charniak 2000) –(Bikel 2002) (Bikel 2002) retrained to retain Penn-II functional tags (-SBJ, -SBJ, -LOC, -TMP, -CLR, etc.) Pipeline architecture: tagged text Bikel retrained + f-structure annotation algorithm + LDD resolution f-structures automatic conversion evaluation against XLE/RASP gold standards PARC- 700/CBS-500 dependency banks Parsing: Penn-II and LFG
163
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources163 Systematic differences between our f-structures and PARC 700 and CBS 500 dependency representations Automatic conversion of our f-structures to PARC 700 / CBS 500 -like structures (Burke et al. 2004, Burke 2006, Cahill et al. under review) Best XLE and RASP resources with better results than those reported in literature to date (Crouch et al. 2002) and (Carroll and Briscoe 2002) evaluation software (Noreen 1989) Approximate Randomisation Test to test for statistical significance of results Parsing: Penn-II and LFG
164
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources164 Parsing: Penn-II and LFG Result dependency f-scores: PARC 700 XLE vs. BKR-LFG: –80.55% XLE –83.08% BKR-LFG (+2.53%) CBS 500 RASP vs. BKR-LFG: –76.57% RASP –80.23% BKR-LFG (+3.66%) Results statistically significant at 95% level (Noreen 1989) Approximate Randomisation Test BKR-LFG = treebank-induced Lexical-Functional Grammar resources with Bickel retrained (BKR) as c-structure engine in pipeline architecture
165
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources165 Parsing: Penn-II and LFG PARC 700 Evaluation:
166
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources166 Parsing: Penn-II and LFG
167
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources167 Parsing: Penn-II and LFG
168
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources168 Parsing: Penn-II and LFG
169
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources169 Parsing: Penn-II and LFG
170
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources170 Parsing: Penn-II and LFG
171
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources171 Probability Models: Penn-II & LFG Probability Models: Our approach does not constitute proper probability model (Abney, 1996) Why? Probability model leaks: Highest ranking parse tree may feature f-structure equations that cannot be resolved into f-structure Probability associated with that parse tree is lost Doesn’t happen often in practise (coverage >99.5% on unseen data) Research on appropriate discriminative, log-linear or maximum entropy models is important (Miyao and Tsujii, 2002) (Riezler et al. 2002)
172
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources172 Generation: Penn-II & LFG Cahill and van Genabith, 2006
173
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources173 Generation: Penn-II & LFG
174
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources174 Generation: Penn-II & LFG
175
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources175 Generation: Penn-II & LFG
176
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources176 Generation: Penn-II & LFG
177
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources177 Generation: Penn-II & LFG
178
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources178 Generation: Penn-II & LFG
179
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources179 Generation: Penn-II & LFG
180
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources180 Generation: Penn-II & LFG
181
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources181 Generation: the Good, the Bad and the Ugly Orig: Supporters of the legislation view the bill as an effort to add stability and certainty to the airline-acquisition process, and to preserve the safety and fitness of the industry. Gen: Supporters of the legislation view the bill as an effort to add stability and certainty to the airline-acquisition process, and to preserve the safety and fitness of the industry. Orig: The upshot of the downshoot is that the A 's go into San Francisco 's Candlestick Park tonight up two games to none in the best-of-seven fest. Gen: The upshot of the downshoot is that the A 's tonight go into San Francisco 's Candlestick Park up two games to none in the best-of-seven fest. Orig: By this time, it was 4:30 a.m. in New York, and Mr. Smith fielded a call from a New York customer wanting an opinion on the British stock market, which had been having troubles of its own even before Friday 's New York market break. Gen: Mr. Smith fielded a call from New a customer York wanting an opinion on the market British stock which had been having troubles of its own even before Friday 's New York market break by this time and in New York, it was 4:30 a.m.. Orig: Only half the usual lunchtime crowd gathered at the tony Corney & Barrow wine bar on Old Broad Street nearby. Gen: At wine tony Corney & Barrow the bar on Old Broad Street nearby gathered usual, lunchtime only half the crowd,.
182
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources182 Domain Variation, Multilingual LFG Resources, etc. Domain variation: ATIS (Judge et al 2005) and QuestionBank (Judge et al 2006) F-Str -> (Q)LF Quasi-Logical Forms (Cahill et al. 2003) Multilingual treebank-based LFG acquisition: –German: TIGER treebank (Cahill et al 2003), (Cahill et al 2005) –Chinese: Chinese Penn Treebank (Burke et al 2004) –Spanish: Cast3LB (O’Donovan et al 2005), (Chrupala and van Genabith 2006) GramLab Project at DCU (2005-2008): Chinese, Japanese, Arabic, Spanish, French and German
183
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources183 Demo System http://lfg-demo.computing.dcu.ie/lfgparser.html
184
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources184 Publications A. Cahill and J. Van Genabith, Robust PCFG-Based Generation using Automatically Acquired LFG- Approximations, COLING/ACL 2006, Sydney, Australia J. Judge, A. Cahill and J. van Genabith, QuestionBank: Creating a Corpus of Parse-Annotated Questions, COLING/ACL 2006, Sydney, Australia G. Chrupala and J. van Genabith, Using Machine-Learning to Assign Function Labels to Parser Output for Spanish, COLING/ACL 2006, Sydney, Australia M. Burke, Automatic Treebank Annotation for the Acquisition of LFG Resources, Ph.D. Thesis, School of Computing, Dublin City University, Dublin 9, Ireland. 2005 R. O’Donovan, Automatic Extraction of Large-Scale Multilingual Lexical Resources, Ph.D. Thesis, School of Computing, Dublin City University, Dublin 9, Ireland. 2005 R. O'Donovan, M. Burke, A. Cahill, J. van Genabith and A. Way. Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks, Computational Linguistics, 2005 A. Cahill, M. Forst, M. Burke, M. McCarthy, R. O'Donovan, C. Rohrer, J. van Genabith and A. Way. Treebank-Based Acquisition of Multilingual Unification Grammar Resources; Journal of Research on Language and Computation; Special Issue on "Shared Representations in Multilingual Grammar Engineering", (eds.) E. Bender, D. Flickinger, F. Fouvry and M. Siegel, Kluwer Academic Press, 2005
185
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources185 Publications R. O'Donovan, A. Cahill, J. van Genabith, and A. Way. Automatic Acquisition of Spanish LFG Resources from the CAST3LB Treebank; In Proceedings of the Tenth International Conference on LFG, Bergen, Norway, 2005 J. Judge, M. Burke, A. Cahill, R. O'Donovan, J. van Genabith, and A. Way. Strong Domain Variation and Treebank-Induced LFG Resources; In Proceedings of the Tenth International Conference on LFG, Bergen, Norway,2005 M. Burke, A. Cahill, J. van Genabith, and A. Way. Evaluating Automatically Acquired F-Structures against PropBank; In Proceedings of the Tenth International Conference on LFG, Bergen, Norway, 2005 M. Burke, A. Cahill, M. McCarthy, R.O'Donovan, J. van Genabith and A. Way. Evaluating Automatic F- Structure Annotation for the Penn-II Treebank; Journal of Language and Computation; Special Issue on "Treebanks and Linguistic Theories", (eds.) E. Hinrichs and K.Simov, Kluwer Academic Press. 2005. pages 523-547 A. Cahill. Parsing with Automatically Acquired, Wide-Coverage, Robust, Probabilistic LFG Approximations. Ph.D. Thesis. School of Computing, Dublin City University, Dublin 9, Ireland. 2004 M. Burke, O. Lam, A. Cahill, R. Chan, R. O'Donovan, A. Bodomo, J. van Genabith and A. Way; Treebank- Based Acquisition of a Chinese Lexical-Functional Grammar; Proceedings of the PACLIC-18 Conference, Waseda University, Tokyo, Japan, pages 161-172, 2004
186
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources186 Publications M. Burke, A. Cahill, R. O'Donovan, J. van Genabith, and A. Way. The Evaluation of an Automatic Annotation Algorithm against the PARC 700 Dependency Bank, In Proceedings of the Ninth International Conference on LFG, Christchurch, New Zealand, pages 101-121, 2004 A. Cahill, M. Burke, R. O'Donovan, J. van Genabith, and A. Way. Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations, In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), July 21-26 2004, pages 320- 327, Barcelona, Spain, 2004 R. O'Donovan, M. Burke, A. Cahill, J. van Genabith, and A. Way. Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II Treebank, In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), July 21-26 2004, pages 368-375, Barcelona, Spain, 2004 M. Burke, Cahill A., R. O' Donovan, J. van Genabith and A. Way. Treebank-Based Acquisition of Wide- Coverage, Probabilistic LFG Resources: Project Overview, Results and Evaluation, The First International Joint Conference on Natural Language Processing (IJCNLP-04), Workshop "Beyond shallow analyses - Formalisms and statistical modeling for deep analyses"; March 22-24, 2004 Sanya City, Hainan Island, China, 2004 Cahill A., M. Forst, M. McCarthy, R. O' Donovan, C. Rohrer, J. van Genabith and A. Way. Treebank-Based Multilingual Unification-Grammar Development. In the Proceedings of the Workshop on Ideas and Strategies for Multilingual Grammar Development, at the 15th European Summer School in Logic Language and Information, Vienna, Austria, 18th - 29th August 2003
187
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources187 Publications Cahill A, M. McCarthy, J. van Genabith and A. Way. Quasi-Logical Forms for the Penn Treebank; In (eds.) Harry Bunt, Ielka van der Sluis and Roser Morante; Proceedings of the Fifth International Workshop on Computational Semantics, IWCS-05, January 15-17, 2003, Tilburg, The Netherlands, ISBN: 90-74029- 24-8, pp.55-71, 2003 Cahill A, M. McCarthy, J. van Genabith and A. Way. Evaluating Automatic F-Structure Annotation for the Penn-II Treebank. TLT 2002, Treebanks and Linguistic Theories 2002, 20th and 21st September 2002, Sozopol, Bulgaria, (eds.) E. Hinrichs and K. Simov, Proceedings of the First Workshop on Treebanks and Linguistic Theories (TLT 2002), pp. 42-60, 2002 Cahill A, M. McCarthy, J. van Genabith and A. Way. Parsing with PCFGs and Automatic F-Structure Annotation, In M. Butt and T. Holloway-King (eds.): Proceedings of the Seventh International Conference on LFG CSLI Publications, Stanford, CA., pp.76--95. 2002 Cahill A, and J. van Genabith. TTS - A Treebank Tool; in LREC 2002, The Third International Conference on Language Resources and Evaluation, Las Palmas de Grand Canaria, Spain, May 27th--June 2nd, 2002, Proceedings of the Conference, Volume V, (eds.) M.G.Rodriguez and C.P. Suarez Arnajo, ISBN 2- 9517408-0-8, pp. 1712-1717, 2002 Cahill A, M. McCarthy, J. van Genabith and A. Way. Automatic Annotation of the Penn-Treebank with LFG F- Structure Information; LREC 2002 workshop on Linguistic Knowledge Acquisition and Representation - Bootstrapping Annotated Language Data, LREC 2002, Third International Conference on Language Resources and Evaluation, post-conference workshop, June 1st, 2002, proceedings of the workshop, (eds.) A. Lenci, S. Montemagni and V. Pirelli, ELRA - European Language Resources Association, Paris France, pp. 8-15, 2002
188
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources188 Penn-II-Based Acquisition of CCG Resources Combinatory Categorial Grammar
189
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources189 This lecture Recap: CCG Translating the Penn Treebank to CCG –The translation algorithm –CCGbank: the acquired grammar and lexicon Wide-coverage parsing with CCG
190
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources190 Categories: specify subcat lists of words/constituents. Combinatory rules: specify how constituents can combine. The lexicon: specifies which categories a word can have. Derivations: spell out process of combining constituents. CCG: the machinery
191
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources191 CCG categories Simple categories: NP, S, PP Complex categories: functions which return a result when combined with an argument: VP or intransitive verb:S\NP Transitive verb: (S\NP)/NP Adverb:(S\NP)\(S\NP) PPs:((S\NP)\(S\NP))/NP (NP\NP)/NP
192
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources192 The combinatory rules Function application: x.f(x) a f(a) X/YY X(>) YX\Y X(<) Function composition: x.f(x) y.g(y) x.f(g(x)) X/YY/Z X/Z (>B) Y\ZX\Y X/Z (<B) X/YY\Z X\Z(>Bx) Y/ZX\Y X/Z(<Bx) Type-raising: a f.f(a) X T/(T\X) (>T) X T\(T/X) (<T)
193
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources193 CCG derivations Canonical “normal-form” derivations (mostly function application): Alternative derivations:
194
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources194 Type-raising and Composition Wh-movement: Right-node raising:
195
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources195 CCG: semantics Every syntactic category and rule has a semantic counterpart:
196
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources196 From the Penn Treebank to CCG The basic translation algorithm Dealing with null elements Type-changing rules in the grammar Preprocessing CCGbank: The extracted lexicon/grammar
197
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources197 Input: Penn Treebank tree Flat phrase-structure tree Traces/null elements and indices represent underlying dependencies Function tags
198
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources198 Output: CCG derivation Binary derivation tree with explicit “deep” dependency structures and subcategorization information. No null elements
199
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources199 I. Identify heads, arguments, adjuncts
200
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources200 II. Binarise the tree
201
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources201 III. Assign CCG categories
202
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources202 Morphosyntactic Features Features on verbal categories: declarative, infinitival, past participle, present participle, passive Sentential features: wh-questions, yes-no questions, embedded questions, embedded declaratives, fragments, etc. CCGbank has no case or number distinction!
203
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources203 III. Assign CCG categories: adjuncts
204
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources204 III. Assign CCG categories: arguments
205
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources205 IV. Assign predicate-argument structure We approximate predicate-argument structure by word-word dependencies These are defined by the argument slots of functor catgeories: just (S\NP)/(S\NP) opened opened (S[dcl]\NP)/NP doors
206
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources206 IV. Assign predicate-argument structure Non-local dependencies arise through: –Binding and control: “He may want you to listen” –Extraction: “the tapas that he told us she ate” Both are mediated by lexical categories: –Control verbs, auxiliaries/modals –Relative pronouns We represent this via coindexation: (NP\NP i )/(S[dcl]/NP i ) In CCGbank: added automatically to certain category types
207
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources207 Lexical categories that mediate dependencies Auxiliaries/modals, raising verbs: will, might, seem (S[dcl]\NP i )/(S[b]\NP i ) Control verbs: persuade you to go ((S[dcl]\NP)/(S[to]\NP i ))/NP i Relative pronouns: which, who, that (NP\NP i )/(S[dcl]/NP i ) Many more (listed in CCGbank manual)
208
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources208 Summary: The basic algorithm 1.Identify heads, complements and adjuncts. 2.Binarize the tree. 3.Assign CCG categories. 4.Add co-indexation to lexical categories. 5.Create predicate-argument structure.
209
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources209 Problems with basic algorithm Depends on Treebank markup: –Complement/adjunct distinction –The analyses don’t always correspond to CCG analysis –Errors in Treebank annotation Proliferation of categories:
210
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources210 The need for preprocessing Eliminating (some of) the noise: –POS-tagging errors –Bracketing errors (coordination!) Changing the Treebank analyses: –Small clauses Adding more structure: –Insert a noun level into NPs –Analyze QPs, fragments, parentheticals, multiword- expressions
211
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources211 Compacting the grammar: Type-changing rules Type-changing rules for adjuncts capture syntactic regularities:
212
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources212 Null elements, traces, and coindexation *-null elements: passive, PRO *T*-traces: wh-movement, tough movement *RNR*-traces: right-node raising Other null elements: –*EXP*: expletive, –*ICH* (“insert constituent here”): extraposition –*U* (units): $ 500 *U* –*PPA* (permanent predictable ambiguity) =-coindexation: argument cluster coordination and gapping
213
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources213 Used for passive or PRO (arbitrary or controlled): Only the passive * matters for translation: (S with null subject = VP = S\NP) * null elements
214
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources214 Unbounded long-range dependencies … arising through extraction (*T*): –Wh-movement (relative clauses and wh-questions): the articles that (you believed he saw that…) I filed –Tough-movement: Peter is easy to please –Parasitic gaps: the articles that I filed without reading … arising through coordination (*RNR* and =): – Right-node raising: [[Mary ordered] and [John ate]] the tapas. – Argument cluster coordination: Mary ordered [[tapas for herself] and [wine for John]]. – Sentential gapping: [[Mary ordered tapas] and [John beer]].
215
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources215 Dealing with extraction Penn Treebank: *T* traces indicate extraction
216
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources216 Dealing with extraction Pass the extracted NP up to relative clause. The relative pronoun subcategorizes for an ‘incomplete’ sentence: (NP\NP)/(S[dcl]\NP) for subject relatives (NP\NP)/(S[dcl]/NP) for object relatives The derivation uses type-raising and composition
217
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources217 Right node raising in the Penn Treebank
218
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources218 Right node raising in CCGbank
219
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources219 Argument-cluster coordination “Template gapping” annotation: Co-indexation between constituents in conjuncts The first conjunct contains the head
220
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources220 Argument-cluster coordination in CCGbank The shared constituents are coordinated (via type-raising and composition): X T\(T/NP) (<T) NP (S\NP)\((S\NP)/NP) (<T)
221
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources221 Sentential Gapping In the Treebank: CCG uses decomposition to obtain the types (interpretation is given extragrammatically)
222
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources222 Remaining problems: NP level Lists and appositives are indistinguishable: Compound nouns have no internal structure:
223
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources223 Remaining problems: other constructions Complement-adjunct distinction:
224
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources224 Putting it all together…. Funds that are or soon will be listed in New York or London
225
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources225 The CCG derivation
226
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources226 that: (NP i \NP i )/(S[dcl]\NP i ) funds are,will The relative clause:
227
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources227 The right-node-raising VP
228
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources228 CCGbank Coverage of the translation algorithm: 99.44% of all sentences in the Treebank (main problem: sentential gapping) The lexicon (sec.02-21): –74,669 entries for 44,210 word types –1286 lexical category types (439 appear once, 556 appear 5 times or more) The grammar (sec. 02-21): –3262 rule instantiations (1146 appear once)
229
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources229 The most ambiguous words
230
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources230 Frequency distribution of categories
231
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources231 Lexical coverage How well does our lexicon cover unseen data? “Training” data: sections 02-21 Test data: section 00 The lexicon contains the correct entries for 94.0% of the tokens in section 00. 3.8% of the tokens in section 00 do not appear in sections 02-21. 35% of the unknown tokens are N 29% of the unknown tokens are N/N
232
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources232 Statistical Parsing with CCG The data: CCGbank The algorithms: standard CKY chart parsing (and a supertagger) The models: –Generative: Hockenmaier and Steedman (2002) –Conditional: Clark and Curran (2004)
233
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources233 Parsing algorithms for CCG CCG derivations are binary trees. Standard chart parsing algorithms (eg. CKY) can be used. Complexity: O(n 6 ) (or O(n 3 ) if the category set is fixed) Recovery of “deep” dependencies require feature structures. Supertagging: assign most likely categories to words before parsing. Significantly speeds up parsing!
234
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources234 Parsing models Generative models: P(s, ) Model the process which generates the derivation –Advantage: easy to guarantee consistency –Disadvantage: requires good smoothing techniques, difficult to include complex features Good baseline Conditional models: P( |s) Given a sentence s, predict most likely derivation –Advantage: more natural for parsing –Disadvantage: large model size, difficult to estimate
235
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources235 Evaluation: recovery of dependency structures LabelledUnlabelled Generative: 83.390.3 (Hockenmaier and Steedman, 2002) Conditional:84.691.2 (Clark and Curran, 2004) This includes long-range dependencies
236
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources236 ccg2sem: from CCG to DRT A Prolog package which translates CCGbank derivations into Discourse Representation Theory structures (Bos, 2005)
237
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources237 CCGbanks for other languages German (Hockenmaier, 2006): –Translation of German TIGER corpus into CCG. –Many crossing dependencies, etc.: context-free approximations are inappropriate –Current coverage: 92.4% of all graphs (excluding headlines, fragments etc.) Turkish (Cakici, 2005): –Extracts a CCG lexicon from the METU Sabanci Treebank.
238
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources238 A few references General CCG references: M. Steedman (2000). The Syntactic Process, MIT Press. M. Steedman (1996). Surface Structure and Interpretation, MIT Press. CCGbank(s) and wide-coverage CCG parsing: J. Hockenmaier and M. Steedman (2005). CCGbank: User’s Manual, MS-CIS-05-09, Dept. of Computer and Information Science, University of Pennsylvania. J. Hockenmaier and M. Steedman (2002). Acquiring Compact Lexicalized Grammars from a Cleaner Treebank, LREC, Las Palmas, Spain. J. Hockenmaier (2003). Data and Models for Statistical Parsing with Combinatory Categorial Grammar. PhD thesis, Infomatics, University of Edinburgh. J. Hockenmaier and M. Steedman (2002). Generative Models for Statistical Parsing with Combinatory Categorial Grammar, ACL ‘02, Philadelphia, PA, USA. S. Clark and J. R. Curran (2004). Parsing the WSJ using CCG and Log-Linear Models ACL '04, Barcelona, Spain. S. Clark and J. R. Curran (2004). The Importance of Supertagging for Wide-Coverage CCG Parsing. Coling’04, Geneva, Switzerland. J. Bos (2005): Towards Wide-Coverage Semantic Interpretation. IWCS-6. R. Cakici (2005). Automatic Induction of a CCG Grammar for Turkish. ACL Student Research Workshop, Ann Arbor, Mi, USA. J. Hockenmaier (2006). Creating a CCGbank and a wide-coverage CCG lexicon for German. ACL/COLING ‘06, Sydney, Australia.
239
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources239 More references The CCG website: http://groups.inf.ed.ac.uk/ccg with lots of general references about CCG (as well as CCGbank, CCG parsing, etc.)groups.inf.ed.ac.uk/ccg CCGbank is available from the Linguistic Data Consortium (LDC) at the University of Pennsylvania.
240
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources240 Penn-II-Based Acquisition of HPSG Resources Head-Driven Phrase Structure Grammar
241
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources241 Penn-II-Based Acquisition of HPSG Resources Introduction Treebank conversion and HPSG annotation Lexicon extraction Probabilistic models –Feature forest model –Design of features Parsing Evaluation Advanced topics
242
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources242 Introduction If we had an HPSG version of Penn-II, we could obtain lexical entries and probabilistic models How do we get HPSG-annotated Penn-II? Converting Penn-II into an HPSG-conformant treebank How do we verify the conformity with the HPSG theory? Principles are exploited for the verification –Implementation of principles is relatively easy, while construction of the lexicon is extremely difficult –Principles are hand-coded, while lexical entries are acquired from a converted treebank
243
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources243 Introduction We develop a treebank rather than a lexicon A treebank provides more information than a lexicon –Verification of the consistency of the grammar –Statistics Principles Lexicon Treebank
244
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources244 Methodology Treebank Principles Lexicon pretty/JJ database/NN Treebank conversion HPSG treebank Lexicon extraction Grammar writer
245
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources245 Comparison with conventional grammar development Lexicon extractor Lexicon Principles Treebank Parser Grammar writer Principles Lexicon Treebank Corpus edit verify Treebank-based development Manual development
246
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources246 Treebank conversion and HPSG annotation Convert Penn-style parse trees into HPSG-style parse trees –Correcting frequent errors in Penn Treebank Ex. Confusion of VBD/VBN –Converting tree structures Small clauses, passives, NP structures, auxiliary/control verbs, LDDs, etc. –Mapping into HPSG-style representations Head/argument/modifier distinction, schema name assignment Mapping into HPSG categories –Applying HPSG principles/schemas Undetermined features are filled Violations of feature constraints are detected
247
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources247 HEAD verb SUBJ COMPS MOD HEAD verb SUBJ COMPS Overview S making the offer NP NL NP is officially VP head modhead arg S making the offer NP NL NP is officially VP ADVP Error correction & tree conversion Mapping into HPSG-style representation NL HEAD verb SUBJ COMPS subject-head HEAD noun SUBJ COMPS the offer making HEAD adv HEAD verb SUBJ 1 HEAD verb HEAD verb SUBJ 1 HEAD verb is officially HEAD verb head-comp head-mod head-comp Principle application NL HEAD verb SUBJ COMPS HEAD noun SUBJ COMPS the offer making HEAD verb SUBJ COMPS 1 HEAD verb SUBJ COMPS 1 isofficially 1 1 2 HEAD verb SUBJ COMPS 1 2 3 3 HEAD verb SUBJ COMPS 1 4 4 2
248
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources248 Tree conversion Coordination, quotation, insertion, and apposition Small clauses, “than” phrases, quantifier phrases, complementizers, etc. Disambiguation of non-/pre-terminal symbols (TO, etc.) HEAD features (CASE, INV, VFORM, etc.) Noun phrase structures Auxiliary/control verbs Subject extraction Long distance dependencies Relative clauses, reduced relatives
249
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources249 Pattern-based tree conversion tree_transform_rule("predicative", $Input, $Output) :- tree_match(TREE_NODE\$Node & TREE_DTRS\[tree_any & ANY_TREES\$LeftTrees, (TREE_NODE\SYM\"S" & TREE_DTRS\($PRDTrees & [tree_any, tree & TREE_NODE\FUNC\"PRD", tree_any])), tree_any & ANY_TREES\$RightTrees], $Input), append_list([$LeftTrees, $PRDTrees, $RightTrees], $Dtrs), $Output = TREE_NODE\$Node & TREE_DTRS\$Dtrs. S NPVP S NPADJP himself Heconsidered superior S NPVP NPADJP himself Heconsidered superior Tree pattern
250
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources250 Passive “be + VBN” constructions are assigned “VFORM passive ” S been out VP *-2 NP-SBJ-2 have n’t VP thedetails worked /VBN NPPRT VFORM passive
251
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources251 Noun phrase structures Determiners are raised Possessive structures are explicitly represented NP of plant NP Monsanto NP ’s director PP sciences NP of plant NP Monsanto DP ’s director PP sciences N’ NP
252
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources252 Auxiliary/control verbs Auxiliary/control verbs are annotated as taking unsaturated constituents S VP have to choose this particular moment S NPVP NP they NP-1 did n’t *-1 VP SUBJ 1 1 SUBJ 2 SUBJ 2 SUBJ 3 3 = S VP have to choose this particular moment VP NP they NP-1 did n’t VP
253
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources253 Subject extraction HPSG does not allow subject extraction Relativizers are treated as ordinary subjects in relative clauses NP WHNP-1 SBAR S The company NP which NPVP has reported NP *T*-1 net losses NP WHNP-1 SBAR The company NP which VP has reported NP net losses
254
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources254 Subject relative Relativizers have a non-empty list in REL The element of REL is consumed in a head-relative construction and represents the relative-antecedent relation NP WHNP-1 SBAR The company NP which VP has reported NP net losses REL 2 2 2
255
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources255 LDDs: Object relative SLASH represents moved arguments REL represents relative-antecedent relations REL SLASH 1 2 REL SLASH 2 NP WHNP-3 SBAR S the energy and ambitions NP that NP-2 reformers VP S wanted reward VP *T*-3 1 NP to VP NP *-2 SLASH 1 1 1 1 2
256
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources256 Mapping into HPSG-style representations Convert nonterminal symbols into HPSG-style categories Assign schema names to internal nodes NN HEAD: noun AGR: 3sg HEAD: verb VFORM: finite TENSE: past VBD
257
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources257 Category mapping & schema name assignment Example: “NL is officially making the offer” S making the offer NP NL NP is officially VP head modhead arg NL HEAD verb SUBJ COMPS subject-head HEAD noun SUBJ COMPS the offer making HEAD adv HEAD verb SUBJ 1 HEAD verb HEAD verb SUBJ 1 HEAD verb is officially HEAD verb head-comp head-mod head-comp
258
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources258 Principle application inverse_schema_binary(subj_head_schema, $Mother, $Left, $Right) :- $Left = (SYNSEM\($LeftSynsem & LOCAL\CAT\(HEAD\MOD\[] & VAL\(SUBJ\[] & COMPS\[] & SPR\[])))), $Right = (SYNSEM\LOCAL\CAT\(HEAD\$Head & VAL\(SUBJ\[$LeftSynsem] & COMPS\[] & SPR\[]))), $Mother = (SYNSEM\LOCAL\CAT\(HEAD\$Head & VAL\(SUBJ\[] & COMPS\[] & SPR\[]))). HEAD: noun HEAD: verb Heconsidered... HEAD: verb SUBJ: HEAD: verb SUBJ: <> considered... HEAD: noun SUBJ: <> HEAD: verb He structure- sharing
259
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources259 Principle application NL HEAD verb SUBJ COMPS HEAD noun SUBJ COMPS the offermaking HEAD adv MOD officially 1 HEAD verb SUBJ COMPS 1 1 2 1 2 3 3 is HEAD verb SUBJ COMPS 1 1 4 4 2 NL HEAD verb SUBJ COMPS subject-head HEAD noun SUBJ COMPS the offer making HEAD adv HEAD verb SUBJ 1 HEAD verb HEAD verb SUBJ 1 HEAD verb is officially HEAD verb head-comp head-mod head-comp
260
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources260 Complicated example NP we were VP the prices NP S SBAR WHNP-1 head arg 0 charged NP VP *-2*T*-1 arg head
261
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources261 Lexicon extraction Collecting leaf nodes of HPSG parse trees Generalizing leaf nodes into lexical entry templates Applying inverse lexical rules Assigning predicate argument structures
262
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources262 Overview Collection of leaf nodes & generalization Application of inverse lexical rules Assignment of predicate argument structures HEAD verb SUBJ COMPS MOD HEAD verb SUBJ COMPS NL HEAD verb SUBJ COMPS HEAD noun SUBJ COMPS the offer making HEAD verb SUBJ COMPS 1 HEAD verb SUBJ COMPS 1 isofficially 1 1 2 HEAD verb SUBJ COMPS 1 2 3 3 HEAD verb SUBJ COMPS 1 4 4 2 HEAD verb SUBJ COMPS making: HEAD verb SUBJ COMPS make: HEAD verb HEAD noun CONT 2 COMPS HEAD noun CONT 1 SUBJ CONT make’ ARG1 ARG2 2 1
263
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources263 Collecting leaf nodes Leaf nodes of HPSG parse trees are instances of lexical entries NL HEAD verb SUBJ COMPS HEAD noun SUBJ COMPS the offermaking HEAD adv MOD officially 1 HEAD verb SUBJ COMPS 1 1 2 1 2 3 3 is HEAD verb SUBJ COMPS 1 1 4 4 2
264
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources264 Generalization into lexical entry templates Unnecessary constraints are removed (restriction) HEAD: verb SUBJ: noun POSTHEAD: minus HEAD: verb SUBJ: A leaf node of the HPSG treebank Lexical entry template lexical_entry_template($WordInfo, $Sign, $Template) :- copy($Sign, $Template), $Template = (SYNSEM\LOCAL\(CAT\HEAD\$Head & VAL\(SUBJ\$Subj & COMPS\$Comps & SPR\$SPR))),... restriction($SubjSynsem, [NONLOCAL\]), restriction($SubjSynsem, [LOCAL\, CAT\, HEAD\, POSTHEAD\]), restriction($SubjSynsem, [LOCAL\, CAT\, HEAD\, AUX\]), restriction($SubjSynsem, [LOCAL\, CAT\, HEAD\, TENSE\]),...
265
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources265 Application of inverse lexical rules Converting lexical entries of inflected words into lexical entries of lexemes using inverse lexical rules Derivational rules: Ex. passive rule Inflectional rules: Ex. past-tense rule HEAD: verb SUBJ: COMPS: HEAD: verb SUBJ: COMPS: HEAD: verb VFORM: finite TENSE: past HEAD: verb VFORM: base
266
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources266 Predicate argument structures Create mappings from syntactic arguments into semantic arguments COMPS SUBJ HEAD verb make’ ARG1 ARG2 CAT|HEAD noun CONT 1 1 2 VAL CAT|HEAD noun CONT 2 CAT Ex. lexical entry for “make”
267
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources267
268
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources268
269
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources269
270
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources270 Probabilistic models Feature forest model –A solution to the problem of the probabilistic modeling of feature structures Design of features –How to represent preferences of HPSG parse trees
271
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources271 Example: PCFG S NPVP Shedances 0.3 0.2 S NPVP Idance S NPVP Shedanced S NPVP Idanced 0.15 0.2 Estimated prob. S → NP VP NP → She NP → I VP → dances VP → dance VP → danced CFG rule probabilities 1.0 0.5 0.3 0.4 Observed freq. Training data
272
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources272 What is the problem? PCFG assigns probabilities to ungrammatical structures –“She dance” (0.15), “I dances” (0.15) S NPVP Shedances 0.3 0.2 S NPVP Idance S NPVP Shedanced S NPVP Idanced 0.15 0.2 Estimated prob. Observed freq. Training data
273
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources273 Feature structure constraints In HPSG, feature structures explain grammatical constraints “She dance” “I dances” are never generated However, constraints of feature structures violate “independence assumption” of probabilistic models (Abney 1997) S → NP AGR 1 VP AGR 1 NP AGR: 3sg → She NP AGR: no3sg → I VP AGR: 3sg → dances VP AGR: no3sg → dance VP → danced How can we estimate probabilities in this situation?
274
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources274 Solution: ME model Probabilities of parse trees are estimated by maximum entropy models (Berger et al. 1996) Probability p(T) of parse tree T Optimal parameters are computed so as to maximize the likelihood of training data feature function parameter (feature weight) normalization factor
275
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources275 ME model of parse trees If feature functions correspond to CFG rules, this model is an extension of PCFG model Probabilities of parse tress are estimated without independence assumption S NP She VP dances
276
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources276 Estimation by a ME model S NPVP Shedances 0.3 0.2 S NPVP Idance S NPVP Shedanced S NPVP Idanced 0.3 0.2 Estimated prob. Observed freq. Training data S → NP AGR 1 VP AGR 1 1.0 NP AGR: 3sg → She 1.0 NP AGR: no3sg → I 1.0 VP AGR: 3sg → dances 1.145 VP AGR: no3sg → dance 1.145 VP → danced 0.763 ME parameters 1.145 0.763
277
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources277 Combinatorial explosion of parse trees Exponentially many parse trees are assigned to sentences (i.e., a set of T is exponential) S NP 1 VP 1 By expanding... S NP 1 VP 1 S NP 2 VP 1 S NP 1 VP 2 S NP 2 VP 2 Size: nm VP 2 NP 2 n m
278
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources278 Problems by combinatorial explosion Parameter estimation is intractable –Computation of Searching for the most probable parse is intractable –Computation of
279
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources279 Solutions in HMM and PCFG Probabilistic models are divided into independent probabilities, and dynamic programming is applied –Forward-backward probability –Baum-Welch algorithm –Inside-outside probability –Viterbi search Inside/outside probabilities can be computed at a cost proportional to the number of nodes, assuming a forest structure of parse trees
280
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources280 Feature forest model Dynamic programming can also be applied to maximum entropy estimation Feature forest: –Forest structure isomorphic to CFG parse forest –Assign feature functions to nodes rather than symbols A ME model is estimated without unpacking feature forests f (S) f (NP 1 ) f (VP 1 ) Size: n+m feature forest f (NP 2 ) f (VP 2 )
281
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources281 Feature forest representation of a parse tree A feature forest represents exponentially many trees of features f (S) f (NP 1 ) f (VP 1 ) Size: n+m feature forest representation S NP 1 VP 1 VP 2 NP 2 n m f (NP 2 ) f (VP 2 )
282
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources282 feature forest representation Outside T O ( NP 1 ) Inside T I ( NP 1 ) Focus on a set of trees below/above the targeted node Inside trees T I (n) : Trees below n Outside trees T O (n) : Trees above n Inside/outside trees of a feature forest f (S) f (NP 1 ) f (VP 1 ) f (NP 2 ) f (VP 2 )
283
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources283 Estimation algorithms for ME models Estimation of parameters requires computation of model expectations (Malouf 2002) Objective function Gradient Computed from training data Recomputed at each iteration
284
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources284 Inside/outside products Unnormalized product Inside product Outside product feature forest representation f (S) f (NP 1 ) f (VP 1 ) f (NP 2 ) f (VP 2 )
285
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources285 The inside product of NP 1 is a product of inside products of its daughters Computation of inside products feature forest representation f (NP 1 ) f (NP 2 ) f (N 1 ) f (N 2 ) f (N’ 1 ) f (N’ 2 )
286
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources286 The outside products of NP 1 is a product of the mother’s outside products and sister’s inside products Computation of outside products feature forest representation f (S) f (NP 1 ) f (VP 1 ) f (NP 2 ) f (VP 2 )
287
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources287 Computation of model expectations Sum of unnormalized products of trees including NP 1 Expectation of f i at NP 1 feature forest representation f (S) f (NP 1 ) f (VP 1 ) f (NP 2 ) f (VP 2 )
288
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources288 Viterbi search Almost the same as the computation of inside products –“max” rather than “sum” feature forest representation f (NP 1 ) f (NP 2 ) f (N 1 ) f (N 2 ) f (N’ 1 ) f (N’ 2 )
289
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources289 Design of features Feature engineering is important for higher accuracy Feature functions are designed for capturing syntactic/semantic preferences of HPSG parse trees
290
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources290 A chart for HPSG parsing he saw a girl with a telescope HEAD noun SUBCAT <> HEAD prep MOD NP SUBCAT HEAD prep MOD NP SUBCAT <> HEAD noun SUBCAT <> HEAD verb SUBCAT HEAD noun SUBCAT <> HEAD noun SUBCAT <> HEAD prep MOD VP SUBCAT HEAD prep MOD VP SUBCAT <> HEAD verb SUBCAT HEAD verb SUBCAT <> HEAD verb SUBCAT Equivalent signs are packed
291
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources291 Feature forest representation of a chart Node = each rule application HEAD prep MOD NP SUBCAT <> HEAD noun SUBCAT <> HEAD verb SUBCAT HEAD noun SUBCAT <> HEAD prep MOD VP SUBCAT HEAD prep MOD VP SUBCAT <> HEAD verb SUBCAT <> HEAD verb SUBCAT HEAD noun SUBCAT <> HEAD verb SUBCAT HEAD verb SUBCAT HEAD prep MOD VP SUBCAT <> HEAD noun SUBCAT <> HEAD verb SUBCAT HEAD noun SUBCAT <> HEAD noun SUBCAT <> he
292
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources292 Feature forest representation of predicate argument structures Node = already-determined predicate argument relations fact ARG1 want ARG1 4 ARG2 dispute1 I fact want ARG1 ARG2 dispute2 I ARG1 4 ARG2 3 3 want ARG1 ARG2 dispute1 I ARG1 1 1 2 want ARG1 ARG2 dispute2 I ARG1 2 ARG2 She ignored the fact that I wanted to dispute
293
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources293 Extraction of probabilistic events extract_binary_event("hpsg-forest", "bin", $RuleName, $LDtr, $RDtr, _, _, $Event) :- $Event = [$RuleName, $Dist, $Depth|$HDtrFeatures]) :- find_head($Rule, $LSign, $RSign, $Head, $NonHead), rule_name_mapping($Rule, $Head, $NonHead, $RuleName), encode_distance($LSign, $RSign, $Dist), encode_depth($LSign, $RSign, $Depth), encode_sign($Head, $HDtrFeatures, $NDtrFeatures), encode_sign($NonHead, $NDtrFeatures, []). S NPVP ADVP never Cool ran boys NTSPOS word lexical entry depth distance schema span
294
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources294 Atomic features RULE: name of applied rule DIST: distance between head words COMMA: whether the phrase includes commas SPAN: number of words the phrase dominates SYM: nonterminal symbol (e.g. S, VP, …) WORD: head word POS: part-of-speech LE: lexical entry ARG: argument label (ARG1, ARG2,...)
295
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources295 Example: syntactic features Feature for the Head-Modifier construction for “saw a girl” and “with a telescope” he saw a girl with a telescope HEAD noun SUBCAT <> HEAD verb SUBCAT HEAD noun SUBCAT <> HEAD noun SUBCAT <> HEAD prep MOD VP SUBCAT HEAD prep MOD VP SUBCAT <> HEAD verb SUBCAT HEAD verb SUBCAT <> HEAD verb SUBCAT
296
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources296 Example: semantic features Feature for the predicate argument relation between “he” and “saw” girl saw he ARG1 ARG2
297
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources297 Feature generation Features are generated by abstracting descriptions of probabilistic events feature_mask("hpsg-forest", "bin", [1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0]). feature_mask("hpsg-forest", "bin", [1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0]). feature_mask("hpsg-forest", "bin", [1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0]).
298
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources298 Parsing Efficient processing of feature structures (details omitted) –Abstract machines, quick check, CFG filtering, etc. Efficient search with probabilistic HPSG –Beam thresholding –Iterative beam thresholding
299
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources299 Beam thresholding Thresholding out edges in each cell of the chart –Thresholding by number: for each cell, keep only the best n edges –Thresholding by width: keep only the edges whose FOM is greater than w, where w is the difference from the best FOM in the same cell
300
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources300 Effect of beam thresholding Precision and recall by changing parameters of beam search Recall drops, while precision retains
301
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources301 Iterative beam thresholding Start with a narrow beam width Continue widening a beam width until parsing succeeds Iterative_parse(sentence) { w := beam_width_start; while(w < beam_width_end) { parse(sentence, w); if(parse succeeds) return; w := w + beam_width_step; }
302
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources302 Efficacy of iterative beam thresholding Evaluated on Penn Treebank Section 24 (< 15 words) PrecisionRecallF-scoreAvg. time (ms) Viterbi 88.2%87.9%88.1%103923 Beam 89.0%82.4%85.5%88 Iterative 87.6%87.2%87.4%99
303
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources303 Distribution of parsing time Black: Viterbi, Red: iterative beam thresholding
304
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources304 Evaluation Evaluation of the lexical entries extracted from Penn Treebank –Investigation of obtained lexical entries –Coverage Evaluation of the disambiguation model –Parsing accuracy
305
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources305 Experimental settings Training data: Sections 2-21 of Penn Treebank II (39,832 sentences) Test data: –Development set: Section 22 (1,700 sentences) –Final test set: Section 23 (2,416 sentences)
306
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources306 Number of tree conversion rules Target of conversionNumber Penn-II errors102 Category mapping85 Head annotation and binarization63 Difference of phrase structures15 Predicate argument structures13 Long distance dependencies13 Others52 Total343
307
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources307 Result of treebank conversion & lexicon extraction Treebank conversion and HPSG annotation succeeded for 37,886 sentences Extracted lexicon: # words34,765 # lexical entries1,942 Average # lexical entries/word1.43
308
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources308 Sources of treebank conversion failures Classification of failures of treebank conversion in Section 02 (67 failures/1989 sentences) Shortcomings of tree conversion rules18 Errors in Penn Treebank16 Constructions currently unsupported20 Constructions unsupported by HPSG13
309
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources309 Breakdown of extracted lexical entries # words# lexical entries Avg. # lex. entries noun21,9251861.14 verb4,0949451.94 adjective8,078621.28 adverb1,295722.75 preposition1591939.17 particle58101.69 determiner36333.86 conjunction943219.46 punctuation1512022.00 Total34,7651,9421.43
310
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources310 Example lexical entries HEAD noun MOD <> VAL SPR SUBJ <> COMPS <> Common noun Ex. review/NN appeared 140,805 times HEAD verb MOD <> VFORM base VAL SPR <> SUBJ COMPS Transitive verb appeared 12,244 times HEAD adj MOD POSTHEAD - VAL SPR <> SUBJ <> COMPS <> Pre-head adjective appeared 55,049 times
311
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources311 Evaluation of coverage The ratio of lexical entries in the test data covered by the grammar is measured A sentence is covered when all of the lexical entries in the sentence are covered (strong coverage) Lexical entrySentence w/o unknown word handling96.52%54.7% w/ unknown word handling99.15%84.8%
312
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources312 Treebank size vs. coverage
313
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources313 Sentence length vs. coverage
314
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources314 Error analysis Classification of randomly selected uncovered lexical entries Errors of Penn Treebank 10 Errors of treebank conversion 48 Lack of lexical entries 23 Constructions currently unsupported 9 Idioms 6 Non-linguistic expressions (ex. list) 4
315
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources315 Examples of uncovered lexical entries Lack of mappings from words into lexical entries because of data sparseness –Post-noun adjectives (younger, crucial) –Coordination conjunctions of NP and S’ –Verbs taking present-participle as a complement Unsupported constructions –Free relatives, extrapositions Incorrect lexical entries obtained because of idiomatic expressions –(ADVP in part) because …
316
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources316 Evaluation of parsing accuracy Empirical evaluation of the probabilistic models –Overall accuracy –Treebank size vs. accuracy –Sentence length vs. accuracy –Contribution of features –Coverage and accuracy –Error analysis Measure: precision/recall of –e.g.) girl saw he ARG1 ARG2
317
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources317 Effect of feature forest models Accuracy for Section 23 (< 40 words) PrecisionRecall baseline78.1077.39 with syntactic features86.9286.28 with semantic features84.2983.74 with all features86.5486.02
318
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources318 Treebank size vs. accuracy
319
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources319 Sentence length vs. accuracy
320
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources320 Contribution of features (1/2) precisionrecall# features All87.1285.45623,173 - RULE 86.9885.37620,511 - DIST 86.7485.09603,748 - COMMA 86.5584.77608,117 - SPAN 86.5384.98583,638 - SYM 86.9085.47614,975 - WORD 86.6784.98116,044 - POS 86.3684.71430,876 - LE 87.0385.37412,290 None78.2276.4624,847
321
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources321 Contribution of features (2/2) precisionrecall# features All87.1285.45623,173 - DIST,SPAN 85.5484.02294,971 - DIST,SPAN,COMMA 83.9482.44286,489 - RULE,DIST, SPAN,COMMA 83.6181.98283,897 - WORD,LE 86.4884.9150,258 - WORD,POS 85.5683.9464,915 - WORD,POS,LE 84.8983.4333,740 - SYM,WORD, POS,LE 82.8181.4826,761 None78.2276.4624,847
322
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources322 Coverage and accuracy Accuracies for strongly covered/uncovered sentences We can expect accuracy improvements by improving grammar coverage PrecisionRecall# sentences Covered sentences89.3688.961,825 Uncovered sentences75.5774.04319
323
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources323 Error analysis Classification of errors in randomly selected sentences (100 sentences) PP-attachment ambiguity76 Distinction of arguments/modifiers49 Ambiguity of lexical entries44 Errors in test data22 Ambiguity of commas32 Others75
324
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources324 Examples of errors (1/2) Antecedent of a relative clause –It's made only in years when the grapes ripen perfectly (the last was 1979) and comes from a single acre of [NP grapes [S' that yielded a mere 75 cases in 1987 ]]. Argument/modifier distinction of to-phrases –More than a few CEOs say the red-carpet treatment tempts them [VP-modifier to return to a heartland city for future meetings ].
325
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources325 Examples of errors (2/2) Preposition or verb phrase? –Mitsui Mining & Smelting Co. posted a 62 % rise in pretax profit to 5.276 billion yen ($ 36.9 million) in its fiscal first half ended Sept. 30 [VP compared with 3.253 billion yen a year earlier ]. Selection of subcategorization frames –[NP-subject ``Nasty innuendoes,'' ] [VP says [NP-object John Siegal, Mr. Dinkins's issues director, ``designed to prosecute a case of political corruption that simply doesn't exist.'' ]]
326
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources326 Advanced topics Domain adaptation –Adapting the grammar and/or the disambiguation model to a new domain using a small amount of training data Generation –Using the grammar for sentence generation Semantics construction –Obtaining representations of formal semantics from HPSG parsing Applications
327
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources327 Domain adaptation (1/2) Disambiguation models are adapted to a bio domain using small training data –An original probabilistic model is incorporated into a new model as a reference distribution –Parameters of the new model are estimated so as to maximize the likelihood of the new training data Reference distribution
328
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources328 Domain adaptation (2/2) Evaluation with a bio-domain corpus Training data: –Penn Treebank (News): 39,832 sentences –GENIA Treebank (Bio): 3,524 sentences PrecisionRecall News domain87.69%87.16% Bio domain (w/o adaptation) 85.50%83.91% Bio domain87.19%85.58%
329
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources329 Generation (1/2) The methods for HPSG parsing are applied to a chart generator of HPSG –Feature forest model –Iterative beam thresholding he(x) buy(e) the(y) book(z) past(e) {3} {2}{1} {0} {0,3} {0,2} {2,3} {1,3} {1,2} {1,2,3} {0,2,3}{0,1,3} {0,1,2} {0,1,2,3} 0 1 2 3 chart generation He bought the book. 3 210 0-3 1-3 0-2 2-3 1-2 0-1 chart parsing 0 1 2 3 {0,1}
330
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources330 Generation (2/2) Evaluation on Penn Treebank Section 23 Beam width Coverage (%) Avg. generation time (msec.) BLEU Beam thresholding 444.766210.8196 867.7017760.8294 1273.1230740.8327 1672.9042870.8341 2071.8152730.8333 Iterative beam thresholding 8-2082.4716680.7982
331
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources331 Mapping from HPSG parse trees into semantic representations of typed dynamic logic (TDL) –Typed dynamic logic: a variant of dynamic semantics that includes plural semantics, event semantics, and situation semantics (Bekki, 2005) –Completely compositional semantics: lambda calculus composes semantic representations of phrases from lexical representations Semantics construction (1/2) Few boys fell. They died. few(x)[boy ’ x][fall ’ x]ref(x)[die ’ x] Λ
332
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources332 Approach: –Mapping HPSG lexical entries into lexical representations of TDL –Semantic representations of phrases are composed along HPSG parse trees Coverage: around 90% of Penn Treebank Section 23 are assigned well-formed semantic representations Semantics construction (2/2) PHON “loves” HEAD verb SUBJ COMPS
333
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources333 Applications: information extraction Extraction of protein-protein interactions from biomedical paper abstracts –Patterns on predicate argument structures are learned from small annotated data –Precision/recall: 71.8%/48.4%
334
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources334 Applications: text retrieval Retrieval of relational concepts –All sentences in MEDLINE are parsed into predicate argument structures –Relational concepts, such as “what causes cancer”, are retrieved by matching with predicate argument structures –Precision/recall: 60-96%/30-50%
335
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources335 Summary Conversion of Penn Treebank II into an HPSG treebank –Pattern-based tree conversion and principle application Extraction of lexical entries from the HPSG treebank –Generalization, application of inverse lexical rules, and assignment of predicate argument structures Probabilistic modeling of feature structures –Feature forest model Techniques for efficient parsing with probabilistic HPSG –Iterative beam thresholding Evaluation –Coverage and parsing accuracy Advanced topics –Domain adaptation, sentence generation, semantics construction, and practical applications
336
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources336 Publications Corpus-oriented development of HPSG –Y. Miyao, T. Ninomiya, and J. Tsujii. (2003). Lexicalized Grammar Acquisition. In Proc. 10th EACL Companion Volume. –Y. Miyao, T. Ninomiya, and J. Tsujii. (2004) Corpus-oriented grammar development for acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank. In Proc. IJCNLP 2004. –H. Nakanishi, Y. Miyao, and J. Tsujii. (2004). Using Inverse Lexical Rules to Acquire a Wide-coverage Lexicalized Grammar. In the IJCNLP 2004 Workshop on “Beyond Shallow Analyses.” –H. Nakanishi, Y. Miyao and J. Tsujii. (2004). An Empirical Investigation of the Effect of Lexical Rules on Parsing with a Treebank Grammar. In Proc. TLT 2004. –K. Yoshida. (2005). Corpus-Oriented Development of Japanese HPSG Parsers. In 43rd ACL Student Research Workshop.
337
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources337 Publications Feature forest model –Y. Miyao and J. Tsujii. (2002) Maximum entropy estimation for feature forests. In Proc. HLT 2002. Probabilistic models for HPSG –Y. Miyao and J. Tsujii. (2003). A model of syntactic disambiguation based on lexicalized grammars. In Proc. 7th CoNLL. –Y. Miyao, T. Ninomiya and J. Tsujii. (2003). Probabilistic modeling of argument structures including non-local dependencies. In Proc. RANLP 2003. –Y. Miyao, and J. Tsujii. (2005). Probabilistic disambiguation models for wide-coverage HPSG parsing. In Proc. ACL 2005. –T. Ninomiya, T. Matsuzaki, Y. Tsuruoka, Y. Miyao, and J. Tsujii. (2006). Extremely Lexicalized Models for Accurate and Fast HPSG Parsing. In Proc. EMNLP 2006.
338
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources338 Publications Parsing strategies for probabilistic HPSG –Y. Tsuruoka, Y. Miyao and J. Tsujii. (2004). Towards efficient probabilistic HPSG parsing: integrating semantic and syntactic preference to guide the parsing. In the IJCNLP-04 Workshop on “Beyond shallow analyses.” –T. Ninomiya, Y. Tsuruoka, Y. Miyao, and J. Tsujii. (2005). Efficacy of Beam Thresholding, Unification Filtering and Hybrid Parsing in Probabilistic HPSG Parsing. In Proc. IWPT 2005. –T. Ninomiya, Y. Tsuruoka, Y. Miyao, K. Taura, and J. Tsujii. (2006). Fast and Scalable HPSG Parsing. Traitement automatique des langues (TAL). 46(2). Domain adaptation –T. Hara, Y. Miyao, and J. Tsujii. (2005). Adapting a probabilistic disambiguation model of an HPSG parser to a new domain. In Proc. IJCNLP 2005.
339
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources339 Publications Generation –H. Nakanishi, Y. Miyao, and J. Tsujii. (2005). Probabilistic models for disambiguation of an HPSG-based chart generator. In Proc. IWPT 2005. Semantics construction –M. Sato, D. Bekki, Y. Miyao, and J. Tsujii. (2006). Translating HPSG- style Outputs of a Robust Parser into Typed Dynamic Logic. In Proc. COLING-ACL 2006 Poster Session. Applications –Y. Miyao, T. Ohta, K. Masuda, Y. Tsuruoka, K. Yoshida, T. Ninomiya, and J. Tsujii. (2006). Semantic Retrieval for the Accurate Identification of Relational Concepts. In Proc. COLING-ACL 2006. –A. Yakushiji, Y. Miyao, T. Ohta, Y. Tateisi, and J. Tsujii. (2006). Automatic Construction of Predicate-Argument Structure Patterns for Biomedical Information Extraction. In EMNLP 2006 Poster Session.
340
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources340 Comparing LFG, CCG, HPSG and TAG Acquisition
341
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources341 Comparing LFG, CCG, HPSG and TAG Acquisition
342
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources342 Demos
343
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources343 Demos
344
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources344 Future Work & Discussion
345
ESSLLI 2006 Treebank-Based Acquisition of LFG, HPSG and CCG Resources345 Future Work & Discussion
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.