1 Toshiba Confidential TOSHIBA OF EUROPE LTD. 1/1/ History of Major NLP Products & Services in Toshiba 1978 「 JW-10 」 : Japanese Word Processor 1985 「

Slides:



Advertisements
Similar presentations
Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.
Advertisements

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
Dependency Parsing Some slides are based on:
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.
Features and Unification
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
CS 4705 Basic Parsing with Context-Free Grammars.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi and introduced in Tree-adjoining grammars are somewhat similar to context-free.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Natural Language Processing Lecture 6 : Revision.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
7. Parsing in functional unification grammar Han gi-deuc.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Albert Gatt Corpora and Statistical Methods Lecture 11.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
CSA2050 Introduction to Computational Linguistics Parsing I.
Natural Language - General
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
NLP. Introduction to NLP The probabilities don’t depend on the specific words –E.g., give someone something (2 arguments) vs. see something (1 argument)
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Dependency Parsing Parsing Algorithms Peng.Huang
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
NATURAL LANGUAGE PROCESSING
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Probabilistic and Lexicalized Parsing. Probabilistic CFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations –Use weights to.
Natural Language Processing Vasile Rus
CSC 594 Topics in AI – Natural Language Processing
Statistical NLP Winter 2009
Basic Parsing with Context Free Grammars Chapter 13
Probabilistic and Lexicalized Parsing
CS 388: Natural Language Processing: Syntactic Parsing
Natural Language - General
David Kauchak CS159 – Spring 2019
David Kauchak CS159 – Spring 2019
Presentation transcript:

1 Toshiba Confidential TOSHIBA OF EUROPE LTD. 1/1/ History of Major NLP Products & Services in Toshiba 1978 「 JW-10 」 : Japanese Word Processor 1985 「 ASTRANSAC EJ 」 : EtoJ MT System 1989 「 ASTRANSAC JE 」 : JtoE MT System 1995 「 The 翻訳」 : PC MT System (Internet & Personal) 1996 「 News Watch 」 : Information Filtering Service 1999 「 Fresh Eye 」 : Internet Search Engine/Portal 2001 「 KnowledgeMeister 」: KM Support System 2005 Chinese-Japanese Translation Service 2006 「 KnowledgeMeister - Succeed 」

Confidential 00 Month 0000 (edit in View > Header and Footer) 2 Toshiba Confidential 2 TOSHIBA OF EUROPE LTD. 2 Toshiba of Europe Ltd. Hideki Hirakawa Integrated Use of Phrase Structure Forest and Dependency Forest in Preference Dependency Grammar (PDG) 29 January, 2008

3 Toshiba Confidential TOSHIBA OF EUROPE LTD. 3 Agenda Phrase Structure and Dependency Structure Analysis Overview of the Preference Dependency Grammar(PDG) Packed Shared Data Structure “Dependency Forest” Evaluation of Dependency Forest Conclusion

4 Toshiba Confidential TOSHIBA OF EUROPE LTD. 4 Phrase Structure (PS) and Dependency Structure (DS) Two major syntactic representation schemes det pre vp n timeflylikeanarrow n v np pp np s Information explicitly expressed by PS - Phrases (non-terminal nodes) - Structural categories (non-terminal labels) det pre vpp sub timeflylikeanarrow Information explicitly expressed by DS - Head-dependent relations (directed arcs) - Functional categories (arc labels) Phrase Structure (PS) Dependency Structure (DS)

5 Toshiba Confidential TOSHIBA OF EUROPE LTD. 5 Constituency and dependency describe different dimensions. A phrase-structure tree (PST) is closely related to a derivation, whereas a dependency tree rather describes the product of a process of derivation. Constituency and dependency are not adversaries, they are complementary notions. Using them together we can overcome the problems that each notion has individually. Formal & Computational Aspects of Dependency Grammar [ Kruijff 02] Relation between PS (Constituency) and DS

6 Toshiba Confidential TOSHIBA OF EUROPE LTD. 6 Phrase structure analysis - Lexicalized PCFG Lexical information (including dependency relation) improves PS analysis accuracy (ex. Charniak 1997; Collins 1999; Bikel 2004) - Use of dependency relations as discriminative features of maximum entropy phrase structure parser (ex. HPSG Parser (Oepen 2002), Reranking parser (Charniak and Johnson 2005)) - Use of another independent shallow dependency parser (Sagae et al. 2007) Dependency analysis Almost no use of phrase structure information (Kakari-uke parsers, MSTParser (McDonald 2005), Malt parser(Nivre 2004) Integration requires mapping Integration of PS and DS requires mapping between two structures of a sentence because sentence analyzers cannot combine any linguistic information without correspondence between the two structures. Integrated Use of Phrase and Dependency Structures

7 Toshiba Confidential TOSHIBA OF EUROPE LTD. 7 Mapping between PS and DS ( traditional researches ) Conversion from/to PS to/from DS based on heuristics Phrase Structure Tree (PST) → Dependency Tree (DT) [Collins 99], DT → PST [Xia&Palmer 00] ⇒ Measurement of parse accuracy, tree bank creation etc. Grammar equivalence [Gaifman 65],[Abney 94] studied the equivalence relation between CFG PSG (CFG) and DG (Tesniere model DG) ⇒ DG is strongly equivalent to only sub-class of CFG *1 Structure mapping based on packed shared data structures Partial structure mapping framework based on the Syntactic Graph [Seo&Simmons 89]. Creates mappings between PSTs and DTs based on partial structure mapping rules (described later) ⇒ Syntactic graph generates inappropriate mapping [Hirakawa 06] Complete mapping based on the “ Dependency Forest ” ⇒ Integrated use of PS and DS (described later)

8 Toshiba Confidential TOSHIBA OF EUROPE LTD. 8 Agenda Phrase Structure and Dependency Structure Analysis Overview of the Preference Dependency Grammar(PDG) Packed Shared Data Structure “Dependency Forest” Evaluation of Dependency Forest Conclusion

9 Toshiba Confidential TOSHIBA OF EUROPE LTD. 9 Basic Sentence Analysis Model Sentence ◎ ○ ○ × × × × × × × ○ × Generation Knowledge generates all possible interpretations Interpretation Space prescribed by interpretation description scheme Constraint Knowledge rejection of interpretations Preference Knowledge preference order of interpretations ○ Interpretation ◎ correct ○ plausible × implausible ◎ ○× >> Optimum Interpretation Extraction ◎ The optimum interpretation reject accept

10 Toshiba Confidential TOSHIBA OF EUROPE LTD. 10 Example (1) Probabilistic Context Free Grammar(PCFG) ◎ ○ ○ × × × × × × × ○ × ○ Generation Knowledge CFG rules Interpretation Space Phrase structure (parse tree) Constraint Knowledge No constraints Optimum Interpretation Extraction the Viterbi algorithm Preference Knowledge Probabilities of the CFG rules ◎ ○× >> Sentence ◎ The optimum interpretation

11 Toshiba Confidential TOSHIBA OF EUROPE LTD. 11 Eliminative parsing: parsing proceeds by filtering out the incorrect interpretations from all possible interpretations by applying constraints ◎ ○ ○ × × × × × × × ○ × ○ Generation Knowledge Possible dependencies between all words Interpretation Space Dependency trees Constraint Knowledge Unary and binary constraints Optimum Interpretation Extraction No optimum solution search Sentence ◎ Interpretations ○ Preference Knowledge No preference knowledge Example ( 2 ) Constraint Dependency Grammar (CDG)

12 Toshiba Confidential TOSHIBA OF EUROPE LTD. 12 Basic Sentence Analysis Model of PDG PK:Preference Knowledge,CK: Constraint Knowledge, GK:Generation Knowledge,IS: Interpretation Space (a) NLA system with multilevel interpretation space (b) Packed shared data structure and interpretation mapping (c) Interpretations are externalizations of the lower level interpretations Multilevel Packed Shared Data Connection Model PK1CK1 Sentence GK1 IS1 5◇5◇ ◇ 3◇3◇ IS2 ◎ The Optimum Interpretation Optimum Interpretation Extraction mapping 2◇2◇ △ 2△2△ △ △ 4△4△ △ △ 5△5△ 6△6△ m△m△ 3△3△ △ 1△1△ △ Level 1 Interpretation: IS3 △ △ ◎ ◇ ◇ l ◇ ◇ ◇ ◇ 4◇4◇ 1◇1◇ ◇ ◇ ◇ △ △ PK2CK2 GK2 PK3CK3 GK3 2○ n○ 5○ 1◎1◎ ○ ○ 4○ 3○ 6○ ○ ○ Level 2 Interpretation:Level 3 Interpretation: 1. Data Structure 2. Optimum Solution Search

13 Toshiba Confidential TOSHIBA OF EUROPE LTD. 13 PDG Implementation Model (data structure) WPP = Word POS Pair, Phrase structure forest (PSF) = (packed shared) parse forest Syntactic Layer ○ ○ All PSTs All DTs Sentence “ Time flies ” Morphological Layer The Optimum Dependency Tree ○ All WPP sequences Interpretation mapping Phrase str. forest np vp fly/v time/n time/v fly/n vp root s s Dependency forest top fly/v time/n time/v fly/n obj sub top fly/vtime/n time/vfly/n WPP trellis △ △ × × × × × top fly/v time/n sub top DT PST np vp time/n fly/v root s WPP sequence fly/vtime/n PDG is an all-pair dependency analysis method with three level architecture utilizing three packed shared data structures Integrated use of PS and DS level in syntactic layer

14 Toshiba Confidential TOSHIBA OF EUROPE LTD. 14 □ : △ △ × : ◎ Optimum interpretation 1 □ 2 ◎ 1 ◎ : ◎ Optimum interpretation □ 2 ◎ 1 ◎ : MSTParser PDG All Morphological Interpretations 1-best Morphological Interpretation No CFG Grammar Morphology Level All DS Interpretations All Interpretations with no POS ambiguities ◎ ◎ : Well-formed Interpretations Sentence ◎,◎, ◎ Comparison with other dependency analysis methods No CFG Grammar Sentence All DS Interpretations PS Level DS Level CDG All PS Interpretations CFG Filtering CDG: Constraint Dependency Grammar, MSTParser : Maximum Spanning Tree Parser Combinatorial Explosion Over Pruning

15 Toshiba Confidential TOSHIBA OF EUROPE LTD. 15 PDG Implementation Model (optimum solution search) Integration of Preference Knowledge: Preference scores based on multilevel data structures are integrated into scores on a DF Scoring “ Time flies ” Graph Branch Algorithm PS forest np vp fly/v time/n time/v fly/n vp root s s Dep. forest top fly/v time/n time/v fly/n obj sub top Sentence fly/v time/n time/v fly/n WPP trellis The optimum dep. tree Score integration WPP seq. scorePhrase str. scoreDep. score top time/n top sub fly/v Syntactic LayerMorphological Layer Optimum solution search

16 Toshiba Confidential TOSHIBA OF EUROPE LTD. 16 PDG Analysis Flow Sentence Dependency Forest PS Forest WPP Trellis Scored Dependency Forest Extended Chart Parser Forest Generation Scoring Optimum Tree Search ・ Preference Score Integration ・ Optimum Tree Search based on CM and PM The Optimum Tree Co-occurrence Score Matrix ・ Dependency Forest Generation

17 Toshiba Confidential TOSHIBA OF EUROPE LTD. 17 Agenda Phrase Structure and Dependency Structure Analysis Overview of the Preference Dependency Grammar(PDG) Packed Shared Data Structure “Dependency Forest” Evaluation of Dependency Forest Conclusion

18 Toshiba Confidential TOSHIBA OF EUROPE LTD. 18 Packed shared data structure encompassing all CFG phrase structure trees [Tomita 87,91] Phrase structure forest Phrase structure tree I det n p det n p det n np np np pp np pp np vp np vp np a girl with a telescope in the forest saw v pp s

19 Toshiba Confidential TOSHIBA OF EUROPE LTD. 19 Dependency Graph: Nodes (Word/WPP/Concept) and Arcs (Dep. Rel.) Dependency Tree: Trees in a dependency graph satisfying some well-formedness condition Packed Shared Data Structure for Dependency Trees Dependency graphs are used widely (a)One and only one element is independent. (b)All others depend directly on some element. (c)No element depends directly on more than one other. (unique head) (d)If element A depends directly on element B and some element C intervenes between them (in linear order of string), then C depends directly on A or on B or some other intervening element. (projectivity) Well-formedness condition by [Robinson 70] Dependency Graph 依存木 det sub obj vpp npp pre det sawawithtelescopetheforestgirlainI pre These arcs are not co-occurable in one dep. tree due to the projectivity constraint

20 Toshiba Confidential TOSHIBA OF EUROPE LTD. 20 = Grammar Rule : partial structure mapping rule X 1 /w 1 Y/w h X h /w h X n /w n X i /w i … … … whwh d1d1 didi w1w1 wiwi dndn wnwn … … Partial Dependency Tree Parser Mapping Sentence Set of dependency trees ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ = Mapping Set of phrase structure trees Packed Shared Dependency Structure (Syntactic Graph) Packed Shared Phrase Structure (Phrase structure forest) Partial Structure Mapping Method [Seo&Simmons 89] Headed CFG Rule

21 Toshiba Confidential TOSHIBA OF EUROPE LTD. 21 Syntactic Graph Packed Shared Data Structure for Dependency Trees Encompasses all dependency trees corresponding to phrase structure trees in the parse forest for a sentence [1,fly,v] [0,time,n] [0,time,v][1,fly,n] [2,like,p] [2,like,v] [3,an,det] [4,arrow,n] mod npp vnp det ppn vpp snp vnp SS S “Time flies likes an arrow” Node: WPP Arc: Dependency Relation Syntactic Graph Exclusion Matrix

22 Toshiba Confidential TOSHIBA OF EUROPE LTD. 22 Completeness and Soundness of the syntactic graph Definitions Completeness : For every parse tree in the forest, there is a syntactic reading from the syntactic graph that is structurally equivalent to that parse tree. ∀ PST : Phr.Str.Tree ∃ DT: Dep.Tree PST corresponds to DT Soundness : For every syntactic reading from the syntactic graph, there is a parse tree in the forest that is structurally equivalent to that syntactic reading. ∀ DT: Dep.Tree ∃ PST : Phr.Str. Tree PST corresponds to DT Problem of the syntactic graph Violation of the soundness [Hirakawa 06] × × × ○ ○ ○ ○ Phrase structure forest Syntactic graph completeness soundness × Dep. tree : DT Phr. str. tree : PT

23 Toshiba Confidential TOSHIBA OF EUROPE LTD. 23 Example of the violation of soundness Tokyo taxi driver call center ○○○○○ nc-1nc-2 nc-6 nc-3 nj-5 nj-7 S rt-8 nj-4 np1 Tokyo taxi driver call center ○ ○○○○ nc-1 nc-2 nc-6 nj-7 S rt-8 np3 Tokyo taxi driver call center ○○○○○ nc-1 nc-6 nc-3 nj-5 S rt-8 np2 Tokyo taxi driver call center ○ ○○○○ nc-2 nc-6 nc-3 nj-4 S rt-8 ○ ○○○○ nc-1 nc-2 nc-6 nc-3 (a) (b) (c) (d) Syntactic graph for (a),(b) and (c) generates (d) which has no corresponding phrase structure tree in the phrase structure forest Syntactic Graph/Exclusion Matrix S rt-8

24 Toshiba Confidential TOSHIBA OF EUROPE LTD. 24 Packed Shared Data Structure for Dependency Trees Dependency Forest(DF) = Dependency Graph(DG) + Co-occurrence Matrix(CM) CM(Dependency Forest): Defines the arc co-occurrence relation ( Equivalent arcs are allowed in DF) Dependency Forest [Hirakawa 06] Co-occurrence Matrix Dependency Graph Dependency Forest for “Time flies like an arrow.” npp 1 9 det 14 pre 15 vpp 20 vpp 18 sub 24 sub 23 obj 4 nc 2 obj 16 0,time/n1,fly/v 0,time/v1,fly/n 2,like/p 2,like/v 3,an/det4,arrow/n root rt 29 rt 32 rt 31 obj 25

25 Toshiba Confidential TOSHIBA OF EUROPE LTD. 25 Features of the Dependency Forest Mapping is assured (phrase structure tree ⇔ dependency tree) → usable for multilevel packed shared data connection model High flexibility in describing constraints ex. non-projective dependency structure *1 *1 : dependency structure violating at least the following projectivity conditions ''no cross dependency exits'' ''no dependency covers the top node''

26 Toshiba Confidential TOSHIBA OF EUROPE LTD. 26 Generation Flow of Phrase Structure Forest and Dependency Forest Input sentence WPP Trellis Parse Forest Initial Dependency Forest DF Extraction Chart Parsing Dictionary Extended CFG Optimum Solution Search Dependency Tree Dependency Forest Morphological Analysis DF Reduction (1) (2) (3) (4) PDG analysis processPDG data structure

27 Toshiba Confidential TOSHIBA OF EUROPE LTD. 27 y/X i →x 1 /X 1,...,x n /X n CFG PDG Grammar Rule Extended CFG rule with phrase head and mapping to dependency structure X i : Variable X h (phrase head) : “X h ” is either of “X 1 ”..“X n ” Rewriting rule part y/X h →x 1 /X 1,...,x n /X n Dependency tree Nodes: X 1,..., X n Top node: X h : [arc(arcname 1,X i,X j ),...,arc(arcname n-1,X k,X l )] Dependency structure part ex. vp /V → v /V, np /NP, pp /PP : [arc(obj,NP,V), arc(vpp,PP,V)] V ( = see/v ) obj PP ( = in/pre ) NP ( = girl/n ) vpp vp/V(=see/v) v/V(=see/v)np/NP(=girl/n)pp/PP(=in/pre) Phrase structure Dependency structure seea girl in the forest

28 Toshiba Confidential TOSHIBA OF EUROPE LTD. 28 Standard Chart Parsing: Structure of Standard Edge a cat chases … Lexical edge Inactive edge Active edge Input position EDGE Start position End position Head category Found constituents Remaining constituents

29 Toshiba Confidential TOSHIBA OF EUROPE LTD. 29 Structure of PDG Edge a cat chases Two extensions to the standard edge structure (1) Mapping to dependency structure (2) Packing of inactive edges PDG (packed) edge is a set of sharable PDG single edges PDG single edge = Standard edge + Phrase head + Dependency structure(tree)

30 Toshiba Confidential TOSHIBA OF EUROPE LTD. 30 ・ Bottom-up chart parser using the Agenda ・ Terminates when the Agenda becomes empty Generation of Phrase Structure Forest and Initial Dependency Forest Chart Agenda φ Inactive Edges Active edges : : < E 3 np 1 → [[det 1 n 1 ]]: [ds 31 ] > < E 4 vp 1 → [[v 1 np 2 ] [v 1 np 3 pp 1 ]]: [ds 41 ds 42 ] > : Phrase Structure Forest a set of inactive edges reachable from the root edge Initial Dependency Graph a set of arcs in the PS forest arc(root-17,[like]-v-2,[root]-x), arc(root-24,[flies]-v-1,[root]-x), arc(root-27,[time]-v-0,[root]-x), arc(sub-16,[flies]-n-1,[like]-v-2), arc(nc-4,[time]-n-0,[flies]-n-1), arc(obj-14,[arrow]-n-4,[like]-v-2), : Arc3,.. Arc8,Arc9,.. Initial Co-occurrence Matrix CM1 ~ 3:CMatrix setting condition CM1: Between arcs in DS ○ ○ CM2: Between arcs in DS and arcs governed by constituents ○ ○ ○ ○ ○○ ○○ ○○ ○○ CM3: Between arcs governed by different constituents ○○ ○ ○

31 Toshiba Confidential TOSHIBA OF EUROPE LTD. 31 Generation of Phrase Structure Forest and Initial Dependency Forest Chart Agenda φ Inactive Edges Active edges : : : Phrase Structure Forest Initial Dependency Graph a set of arcs in the PS forest Initial Co-occurrence Matrix CM1 ~ 3:CMatrix setting condition Initial Dependency Forest 178 np [1,fly,v] [0,time,n] [0,time,v][1,fly,n] [2,like,p] [2,like,v] [3,an,det] [4,arrow,n] 123 np 103 np np 184 vp 188 pp 197 np 189 vp 201 vp 195 vp 191 s 186 s 196 s 186 root

32 Toshiba Confidential TOSHIBA OF EUROPE LTD. 32 Reduction of the Initial Dependency Forest npp 19 vpp 18 sub 24 sub 23 obj 4 nc 2 obj 25 0,time/n1,fly/v 0,time/v1,fly/n Equivalent arc Generated from two grammar rules vp/V → v/V,np/NP : [arc(obj,NP,V)] vp/V → v/V,np/NP,pp/PP : [arc(obj,NP,V), arc(vpp,PP,V)] npp 19 vpp 18 sub 24 sub 23 obj 4 nc 2 0,time/n1,fly/v 0,time/v1,fly/n Reduction more than one equivalent arc is merged into one arc without increasing the number of the generalized dependency trees in the dependency forests

33 Toshiba Confidential TOSHIBA OF EUROPE LTD. 33 Completeness and Soundness of the Dependency Forest Completeness : All phrase structure trees in the parse forest have corresponding dependency trees in the dependency forest. ∀ PT:phrase structure tree ∃ DT:dependency tree dep_tree(PT) = DT Soundness :Every phrase structure tree corresponding to a dependency tree in the dependency forest exists in the phrase structure forest ∀ DT:dependency tree ∃ PT:phrase structure tree dep_tree(PT) = DT × × × × × ○ ○ ○ ○ ○ × DT : dependency tree PT : phrase structure tree Phrase structure forest Dependency forest ○ ○ 1:N correspondence in general The completeness and soundness of the dependency forest is assured [Hirakawa 06]

34 Toshiba Confidential TOSHIBA OF EUROPE LTD. 34 Evaluation of the Dependency Forest Framework Analysis of prototypical ambiguous sentences 1 to N / N to 1 correspondence between phrase structure tree/trees and dependency trees/tree Generation of Non-projective dependency tree

35 Toshiba Confidential TOSHIBA OF EUROPE LTD. 35 =========== s/Sentence =========== (R1) s/VP→ np/NP,vp/VP : [arc(sub,NP,VP)] % Declarative sentence (R2) s/VP→ vp/VP : [] % Imperative sentence ========= np/Noun Phrase ======== (R3) np/N→ n/N : [] % Single noun (R4) np/N2→ n/N1,n/N2 : [arc(nc,N1,N2)] % Compound noun (R5) np/N→ det/DET,n/N : [arc(det,DET,N)] % (R6) np/NP→ np/NP,pp/PP : [arc(npp,PP,NP)] % Prepositional phrase attachment (R7) np/N→ ving/V,n/N : [arc(adjs,V,N)] % Adjectival usage ( subject ) (R8) np/N→ ving/V,n/N : [arc(adjo,V,N)] % Adjectival usage ( object ) (R9) np/V→ ving/V,np/NP : [arc(obj,NP,V)] % Gerund phrase (R10) np/V→ ving/V,np/NP,pp/PP : [arc(obj,NP,V),arc(vpp,PP,V)] % Gerand phrase with PP (R11) np/NP→ np/NP0,and/AND,np/NP: [arc(and,NP0,NP),arc(cnj,AND,NP0)]% Coordination (and) (R12) np/NP→ np/NP0,or/OR,np/NP : [arc(or,NP0,NP),arc(cnj,OR,NP0)] % Coordination (or) ========= vp/Verb ======== phrase (R13) vp/V→ v/V : [] % Intransitive verb (R14) vp/V→ v/V,np/NP : [arc(obj,NP,V)] % Transitive verb (R15) vp/V→ be/BE,ving/V,np/NP : [arc(obj,NP,V),arc(prg,BE,V)] % Progressive (R16) vp/BE→ be/BE,np/NP : [arc(dsc,NP,BE)] % Copular (R17) vp/VP→ vp/VP,pp/PP : [arc(vpp,PP,VP)] % PP-attachment (R18) vp/VP→ adv/ADV,vp/VP : [arc(adv,ADV,VP)] % Adverb modification (R19) vp/V→ v/V,np/NP,adv/ADV,relc/RELP % non-projective pattern :[arc(obj,NP,V),arc(adv,ADV,V),arc(rel,RELP,NP)] ======== pp/Prepositional phrase ======== (R20) pp/P→ pre/P,np/NP :[arc(pre,NP,P)] Grammar rules for typical ambiguities (PP-attachment, Coordination, be-verb usage) Grammar for Ambiguous Sentences

36 Toshiba Confidential TOSHIBA OF EUROPE LTD. 36 PP-attachment Ambiguity Input sentence: I saw a girl with a telescope in the forest. Five well-formed dependency trees 0,I1,saw2,a root 4,with6,telescope8,the9,forest3,girl5,a7,in det 4,0 det 11,0 det42,0 sub 33,20 obj 6,20 vpp 16,15 vpp 27,5 npp 14,10 pre 12,10 pre 24,10 npp 29,5 npp 26,5 root 23,0 Node 0,I: [i]-n-0 1,saw: [saw]-v-1 2,a: [a]-det-2 3,girl: [girl]-n-3 4,with: [with]-pre-4 5,a: [a]-det-5 6,telescope: [telescope]-n-6 7,in: [in]-pre-7 8,the: [the]-det-8 9,forest: [forest]-n-9 root: [root]-x-root Crossing Single role

37 Toshiba Confidential TOSHIBA OF EUROPE LTD. 37 Coordination Scope Ambiguity Input sentence : Earth and Moon or Jupiter and Ganymede. Node 0,earth: [earth]-n-0 1,and: [and]-and-1 2,moon: [moon]-n-2 3,or: [or]-or-3 4,jupiter: [jupiter]-n-4 5,and: [and]-and-5 6,ganymede: [ganymede]-n-6 root: [root]-x-root 0,earth1,and2,moon3,or4,jupitor root 5,and6,ganymede and 12,10 and 25,20 cnj 2,0 or 9,4 cnj 6,0 or 22,3 cnj 14,0 and 18,12 root 26,0 and 14,5 Crossing Single role Five well-formed dependency trees

38 Toshiba Confidential TOSHIBA OF EUROPE LTD. 38 Structural Interpretation Ambiguity and PP-attachment Ambiguity Input sentence: My hobby is watching birds with telescope Ten well-formed dependency trees 0,my1,hobby2,is3,watching4,birds root 5,with6,telescope sub 35,1 sub 38,10 prg 2,10 adj 4,12 dsc 33,8 dsc 36,10 obj 6,15 sub 5,5 npp 23,5 npp 27,3 vpp 24,7 root 44,0 det 1,0 pre 22,0 root 41,0 Node 0,my: [my]-det-0 1,hobby: [hobby]-n-1 2,is: [is]-be-2 3,watching: [watching]-ving-3 4,birds: [birds]-n-4 5,with: [with]-pre-5 6,telescope: [telescope]-n-6 root: [root]-x-root

39 Toshiba Confidential TOSHIBA OF EUROPE LTD. 39 N to 1 Correspondence from PSTs to One DT ( 1 ) Spurious ambiguity (Eisner96),(Noro05) (R17)vp/VP→ vp/VP,pp/PP : [arc(vpp,PP,VP)] % PP-attachment (R18)vp/VP→ adv/ADV,vp/VP : [arc(adv,ADV,VP)] % Adverb modification in the forestsaw a catShecuriously vpppnpadv vp s in the forestsaw a catShecuriously vpppnpadv vp s Rule application: R17 → R18 Rule application: R18 → R17 sawShein the foresta cat curiously adv vpp

40 Toshiba Confidential TOSHIBA OF EUROPE LTD. 40 Modification scope problem (Mel'uk88) Dependency structure has ambiguities in modification scope when it has a head word which has dependants located at the right-hand side and the left-hand side of the head word. ex. Earth and Jupiter in Solar System. 0,Earth1,and2,Jupiter3,in4,Solar System root and4,20 npp8,0pre7,0 cnj2,0 root12,0 ・ Introduction of “Grouping” ( Coordination and operator words (ex. not, only) ) [Mel'uk88] ・ Japanese has no modification scope problem because it has no right to left dependency. Jupiter np in Solar System pp Earth np and cnj np Jupiter np in Solar System pp Earth np and cnj np N to 1 Correspondence from PSTs to One DT ( 2 )

41 Toshiba Confidential TOSHIBA OF EUROPE LTD. 41 Generation of Non-projective Dependency Tree Grammar rule for non-projective dependency tree (R19)vp/V → v/V,np/NP,adv/ADV,relc/REL :[arc(obj,NP,V),arc(adv,ADV,V),arc(rel,RELP,NP)] 1,saw2,the root 5,which was Persian 0,She4,curiously3,cat det 4,0 sub12,20 obj6,20 adv10,15 root 14,0 re 1 11,10 Input sentence : She saw the cat curiously which was Persian *1 *1: Artificial example for showing the rule applicability

42 Toshiba Confidential TOSHIBA OF EUROPE LTD. 42 Conclusion Dependency forest is a packed shared data structure - Bridge between phrase structure and dependency structure usable for Multilevel Packed Shared Data Connection MODEL of PDG - High flexibility in describing constraints Future work Extension of the framework for the modification scope problem (Grouping) Real-world system implementation