1 Toshiba Confidential TOSHIBA OF EUROPE LTD. 1/1/ History of Major NLP Products & Services in Toshiba 1978 「 JW-10 」 : Japanese Word Processor 1985 「 ASTRANSAC EJ 」 : EtoJ MT System 1989 「 ASTRANSAC JE 」 : JtoE MT System 1995 「 The 翻訳」 : PC MT System (Internet & Personal) 1996 「 News Watch 」 : Information Filtering Service 1999 「 Fresh Eye 」 : Internet Search Engine/Portal 2001 「 KnowledgeMeister 」: KM Support System 2005 Chinese-Japanese Translation Service 2006 「 KnowledgeMeister - Succeed 」
Confidential 00 Month 0000 (edit in View > Header and Footer) 2 Toshiba Confidential 2 TOSHIBA OF EUROPE LTD. 2 Toshiba of Europe Ltd. Hideki Hirakawa Integrated Use of Phrase Structure Forest and Dependency Forest in Preference Dependency Grammar (PDG) 29 January, 2008
3 Toshiba Confidential TOSHIBA OF EUROPE LTD. 3 Agenda Phrase Structure and Dependency Structure Analysis Overview of the Preference Dependency Grammar(PDG) Packed Shared Data Structure “Dependency Forest” Evaluation of Dependency Forest Conclusion
4 Toshiba Confidential TOSHIBA OF EUROPE LTD. 4 Phrase Structure (PS) and Dependency Structure (DS) Two major syntactic representation schemes det pre vp n timeflylikeanarrow n v np pp np s Information explicitly expressed by PS - Phrases (non-terminal nodes) - Structural categories (non-terminal labels) det pre vpp sub timeflylikeanarrow Information explicitly expressed by DS - Head-dependent relations (directed arcs) - Functional categories (arc labels) Phrase Structure (PS) Dependency Structure (DS)
5 Toshiba Confidential TOSHIBA OF EUROPE LTD. 5 Constituency and dependency describe different dimensions. A phrase-structure tree (PST) is closely related to a derivation, whereas a dependency tree rather describes the product of a process of derivation. Constituency and dependency are not adversaries, they are complementary notions. Using them together we can overcome the problems that each notion has individually. Formal & Computational Aspects of Dependency Grammar [ Kruijff 02] Relation between PS (Constituency) and DS
6 Toshiba Confidential TOSHIBA OF EUROPE LTD. 6 Phrase structure analysis - Lexicalized PCFG Lexical information (including dependency relation) improves PS analysis accuracy (ex. Charniak 1997; Collins 1999; Bikel 2004) - Use of dependency relations as discriminative features of maximum entropy phrase structure parser (ex. HPSG Parser (Oepen 2002), Reranking parser (Charniak and Johnson 2005)) - Use of another independent shallow dependency parser (Sagae et al. 2007) Dependency analysis Almost no use of phrase structure information (Kakari-uke parsers, MSTParser (McDonald 2005), Malt parser(Nivre 2004) Integration requires mapping Integration of PS and DS requires mapping between two structures of a sentence because sentence analyzers cannot combine any linguistic information without correspondence between the two structures. Integrated Use of Phrase and Dependency Structures
7 Toshiba Confidential TOSHIBA OF EUROPE LTD. 7 Mapping between PS and DS ( traditional researches ) Conversion from/to PS to/from DS based on heuristics Phrase Structure Tree (PST) → Dependency Tree (DT) [Collins 99], DT → PST [Xia&Palmer 00] ⇒ Measurement of parse accuracy, tree bank creation etc. Grammar equivalence [Gaifman 65],[Abney 94] studied the equivalence relation between CFG PSG (CFG) and DG (Tesniere model DG) ⇒ DG is strongly equivalent to only sub-class of CFG *1 Structure mapping based on packed shared data structures Partial structure mapping framework based on the Syntactic Graph [Seo&Simmons 89]. Creates mappings between PSTs and DTs based on partial structure mapping rules (described later) ⇒ Syntactic graph generates inappropriate mapping [Hirakawa 06] Complete mapping based on the “ Dependency Forest ” ⇒ Integrated use of PS and DS (described later)
8 Toshiba Confidential TOSHIBA OF EUROPE LTD. 8 Agenda Phrase Structure and Dependency Structure Analysis Overview of the Preference Dependency Grammar(PDG) Packed Shared Data Structure “Dependency Forest” Evaluation of Dependency Forest Conclusion
9 Toshiba Confidential TOSHIBA OF EUROPE LTD. 9 Basic Sentence Analysis Model Sentence ◎ ○ ○ × × × × × × × ○ × Generation Knowledge generates all possible interpretations Interpretation Space prescribed by interpretation description scheme Constraint Knowledge rejection of interpretations Preference Knowledge preference order of interpretations ○ Interpretation ◎ correct ○ plausible × implausible ◎ ○× >> Optimum Interpretation Extraction ◎ The optimum interpretation reject accept
10 Toshiba Confidential TOSHIBA OF EUROPE LTD. 10 Example (1) Probabilistic Context Free Grammar(PCFG) ◎ ○ ○ × × × × × × × ○ × ○ Generation Knowledge CFG rules Interpretation Space Phrase structure (parse tree) Constraint Knowledge No constraints Optimum Interpretation Extraction the Viterbi algorithm Preference Knowledge Probabilities of the CFG rules ◎ ○× >> Sentence ◎ The optimum interpretation
11 Toshiba Confidential TOSHIBA OF EUROPE LTD. 11 Eliminative parsing: parsing proceeds by filtering out the incorrect interpretations from all possible interpretations by applying constraints ◎ ○ ○ × × × × × × × ○ × ○ Generation Knowledge Possible dependencies between all words Interpretation Space Dependency trees Constraint Knowledge Unary and binary constraints Optimum Interpretation Extraction No optimum solution search Sentence ◎ Interpretations ○ Preference Knowledge No preference knowledge Example ( 2 ) Constraint Dependency Grammar (CDG)
12 Toshiba Confidential TOSHIBA OF EUROPE LTD. 12 Basic Sentence Analysis Model of PDG PK:Preference Knowledge,CK: Constraint Knowledge, GK:Generation Knowledge,IS: Interpretation Space (a) NLA system with multilevel interpretation space (b) Packed shared data structure and interpretation mapping (c) Interpretations are externalizations of the lower level interpretations Multilevel Packed Shared Data Connection Model PK1CK1 Sentence GK1 IS1 5◇5◇ ◇ 3◇3◇ IS2 ◎ The Optimum Interpretation Optimum Interpretation Extraction mapping 2◇2◇ △ 2△2△ △ △ 4△4△ △ △ 5△5△ 6△6△ m△m△ 3△3△ △ 1△1△ △ Level 1 Interpretation: IS3 △ △ ◎ ◇ ◇ l ◇ ◇ ◇ ◇ 4◇4◇ 1◇1◇ ◇ ◇ ◇ △ △ PK2CK2 GK2 PK3CK3 GK3 2○ n○ 5○ 1◎1◎ ○ ○ 4○ 3○ 6○ ○ ○ Level 2 Interpretation:Level 3 Interpretation: 1. Data Structure 2. Optimum Solution Search
13 Toshiba Confidential TOSHIBA OF EUROPE LTD. 13 PDG Implementation Model (data structure) WPP = Word POS Pair, Phrase structure forest (PSF) = (packed shared) parse forest Syntactic Layer ○ ○ All PSTs All DTs Sentence “ Time flies ” Morphological Layer The Optimum Dependency Tree ○ All WPP sequences Interpretation mapping Phrase str. forest np vp fly/v time/n time/v fly/n vp root s s Dependency forest top fly/v time/n time/v fly/n obj sub top fly/vtime/n time/vfly/n WPP trellis △ △ × × × × × top fly/v time/n sub top DT PST np vp time/n fly/v root s WPP sequence fly/vtime/n PDG is an all-pair dependency analysis method with three level architecture utilizing three packed shared data structures Integrated use of PS and DS level in syntactic layer
14 Toshiba Confidential TOSHIBA OF EUROPE LTD. 14 □ : △ △ × : ◎ Optimum interpretation 1 □ 2 ◎ 1 ◎ : ◎ Optimum interpretation □ 2 ◎ 1 ◎ : MSTParser PDG All Morphological Interpretations 1-best Morphological Interpretation No CFG Grammar Morphology Level All DS Interpretations All Interpretations with no POS ambiguities ◎ ◎ : Well-formed Interpretations Sentence ◎,◎, ◎ Comparison with other dependency analysis methods No CFG Grammar Sentence All DS Interpretations PS Level DS Level CDG All PS Interpretations CFG Filtering CDG: Constraint Dependency Grammar, MSTParser : Maximum Spanning Tree Parser Combinatorial Explosion Over Pruning
15 Toshiba Confidential TOSHIBA OF EUROPE LTD. 15 PDG Implementation Model (optimum solution search) Integration of Preference Knowledge: Preference scores based on multilevel data structures are integrated into scores on a DF Scoring “ Time flies ” Graph Branch Algorithm PS forest np vp fly/v time/n time/v fly/n vp root s s Dep. forest top fly/v time/n time/v fly/n obj sub top Sentence fly/v time/n time/v fly/n WPP trellis The optimum dep. tree Score integration WPP seq. scorePhrase str. scoreDep. score top time/n top sub fly/v Syntactic LayerMorphological Layer Optimum solution search
16 Toshiba Confidential TOSHIBA OF EUROPE LTD. 16 PDG Analysis Flow Sentence Dependency Forest PS Forest WPP Trellis Scored Dependency Forest Extended Chart Parser Forest Generation Scoring Optimum Tree Search ・ Preference Score Integration ・ Optimum Tree Search based on CM and PM The Optimum Tree Co-occurrence Score Matrix ・ Dependency Forest Generation
17 Toshiba Confidential TOSHIBA OF EUROPE LTD. 17 Agenda Phrase Structure and Dependency Structure Analysis Overview of the Preference Dependency Grammar(PDG) Packed Shared Data Structure “Dependency Forest” Evaluation of Dependency Forest Conclusion
18 Toshiba Confidential TOSHIBA OF EUROPE LTD. 18 Packed shared data structure encompassing all CFG phrase structure trees [Tomita 87,91] Phrase structure forest Phrase structure tree I det n p det n p det n np np np pp np pp np vp np vp np a girl with a telescope in the forest saw v pp s
19 Toshiba Confidential TOSHIBA OF EUROPE LTD. 19 Dependency Graph: Nodes (Word/WPP/Concept) and Arcs (Dep. Rel.) Dependency Tree: Trees in a dependency graph satisfying some well-formedness condition Packed Shared Data Structure for Dependency Trees Dependency graphs are used widely (a)One and only one element is independent. (b)All others depend directly on some element. (c)No element depends directly on more than one other. (unique head) (d)If element A depends directly on element B and some element C intervenes between them (in linear order of string), then C depends directly on A or on B or some other intervening element. (projectivity) Well-formedness condition by [Robinson 70] Dependency Graph 依存木 det sub obj vpp npp pre det sawawithtelescopetheforestgirlainI pre These arcs are not co-occurable in one dep. tree due to the projectivity constraint
20 Toshiba Confidential TOSHIBA OF EUROPE LTD. 20 = Grammar Rule : partial structure mapping rule X 1 /w 1 Y/w h X h /w h X n /w n X i /w i … … … whwh d1d1 didi w1w1 wiwi dndn wnwn … … Partial Dependency Tree Parser Mapping Sentence Set of dependency trees ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ △ △ △ △ △ △ △ △ △ △ △ △ △ △ △ = Mapping Set of phrase structure trees Packed Shared Dependency Structure (Syntactic Graph) Packed Shared Phrase Structure (Phrase structure forest) Partial Structure Mapping Method [Seo&Simmons 89] Headed CFG Rule
21 Toshiba Confidential TOSHIBA OF EUROPE LTD. 21 Syntactic Graph Packed Shared Data Structure for Dependency Trees Encompasses all dependency trees corresponding to phrase structure trees in the parse forest for a sentence [1,fly,v] [0,time,n] [0,time,v][1,fly,n] [2,like,p] [2,like,v] [3,an,det] [4,arrow,n] mod npp vnp det ppn vpp snp vnp SS S “Time flies likes an arrow” Node: WPP Arc: Dependency Relation Syntactic Graph Exclusion Matrix
22 Toshiba Confidential TOSHIBA OF EUROPE LTD. 22 Completeness and Soundness of the syntactic graph Definitions Completeness : For every parse tree in the forest, there is a syntactic reading from the syntactic graph that is structurally equivalent to that parse tree. ∀ PST : Phr.Str.Tree ∃ DT: Dep.Tree PST corresponds to DT Soundness : For every syntactic reading from the syntactic graph, there is a parse tree in the forest that is structurally equivalent to that syntactic reading. ∀ DT: Dep.Tree ∃ PST : Phr.Str. Tree PST corresponds to DT Problem of the syntactic graph Violation of the soundness [Hirakawa 06] × × × ○ ○ ○ ○ Phrase structure forest Syntactic graph completeness soundness × Dep. tree : DT Phr. str. tree : PT
23 Toshiba Confidential TOSHIBA OF EUROPE LTD. 23 Example of the violation of soundness Tokyo taxi driver call center ○○○○○ nc-1nc-2 nc-6 nc-3 nj-5 nj-7 S rt-8 nj-4 np1 Tokyo taxi driver call center ○ ○○○○ nc-1 nc-2 nc-6 nj-7 S rt-8 np3 Tokyo taxi driver call center ○○○○○ nc-1 nc-6 nc-3 nj-5 S rt-8 np2 Tokyo taxi driver call center ○ ○○○○ nc-2 nc-6 nc-3 nj-4 S rt-8 ○ ○○○○ nc-1 nc-2 nc-6 nc-3 (a) (b) (c) (d) Syntactic graph for (a),(b) and (c) generates (d) which has no corresponding phrase structure tree in the phrase structure forest Syntactic Graph/Exclusion Matrix S rt-8
24 Toshiba Confidential TOSHIBA OF EUROPE LTD. 24 Packed Shared Data Structure for Dependency Trees Dependency Forest(DF) = Dependency Graph(DG) + Co-occurrence Matrix(CM) CM(Dependency Forest): Defines the arc co-occurrence relation ( Equivalent arcs are allowed in DF) Dependency Forest [Hirakawa 06] Co-occurrence Matrix Dependency Graph Dependency Forest for “Time flies like an arrow.” npp 1 9 det 14 pre 15 vpp 20 vpp 18 sub 24 sub 23 obj 4 nc 2 obj 16 0,time/n1,fly/v 0,time/v1,fly/n 2,like/p 2,like/v 3,an/det4,arrow/n root rt 29 rt 32 rt 31 obj 25
25 Toshiba Confidential TOSHIBA OF EUROPE LTD. 25 Features of the Dependency Forest Mapping is assured (phrase structure tree ⇔ dependency tree) → usable for multilevel packed shared data connection model High flexibility in describing constraints ex. non-projective dependency structure *1 *1 : dependency structure violating at least the following projectivity conditions ''no cross dependency exits'' ''no dependency covers the top node''
26 Toshiba Confidential TOSHIBA OF EUROPE LTD. 26 Generation Flow of Phrase Structure Forest and Dependency Forest Input sentence WPP Trellis Parse Forest Initial Dependency Forest DF Extraction Chart Parsing Dictionary Extended CFG Optimum Solution Search Dependency Tree Dependency Forest Morphological Analysis DF Reduction (1) (2) (3) (4) PDG analysis processPDG data structure
27 Toshiba Confidential TOSHIBA OF EUROPE LTD. 27 y/X i →x 1 /X 1,...,x n /X n CFG PDG Grammar Rule Extended CFG rule with phrase head and mapping to dependency structure X i : Variable X h (phrase head) : “X h ” is either of “X 1 ”..“X n ” Rewriting rule part y/X h →x 1 /X 1,...,x n /X n Dependency tree Nodes: X 1,..., X n Top node: X h : [arc(arcname 1,X i,X j ),...,arc(arcname n-1,X k,X l )] Dependency structure part ex. vp /V → v /V, np /NP, pp /PP : [arc(obj,NP,V), arc(vpp,PP,V)] V ( = see/v ) obj PP ( = in/pre ) NP ( = girl/n ) vpp vp/V(=see/v) v/V(=see/v)np/NP(=girl/n)pp/PP(=in/pre) Phrase structure Dependency structure seea girl in the forest
28 Toshiba Confidential TOSHIBA OF EUROPE LTD. 28 Standard Chart Parsing: Structure of Standard Edge a cat chases … Lexical edge Inactive edge Active edge Input position EDGE Start position End position Head category Found constituents Remaining constituents
29 Toshiba Confidential TOSHIBA OF EUROPE LTD. 29 Structure of PDG Edge a cat chases Two extensions to the standard edge structure (1) Mapping to dependency structure (2) Packing of inactive edges PDG (packed) edge is a set of sharable PDG single edges PDG single edge = Standard edge + Phrase head + Dependency structure(tree)
30 Toshiba Confidential TOSHIBA OF EUROPE LTD. 30 ・ Bottom-up chart parser using the Agenda ・ Terminates when the Agenda becomes empty Generation of Phrase Structure Forest and Initial Dependency Forest Chart Agenda φ Inactive Edges Active edges : : < E 3 np 1 → [[det 1 n 1 ]]: [ds 31 ] > < E 4 vp 1 → [[v 1 np 2 ] [v 1 np 3 pp 1 ]]: [ds 41 ds 42 ] > : Phrase Structure Forest a set of inactive edges reachable from the root edge Initial Dependency Graph a set of arcs in the PS forest arc(root-17,[like]-v-2,[root]-x), arc(root-24,[flies]-v-1,[root]-x), arc(root-27,[time]-v-0,[root]-x), arc(sub-16,[flies]-n-1,[like]-v-2), arc(nc-4,[time]-n-0,[flies]-n-1), arc(obj-14,[arrow]-n-4,[like]-v-2), : Arc3,.. Arc8,Arc9,.. Initial Co-occurrence Matrix CM1 ~ 3:CMatrix setting condition CM1: Between arcs in DS ○ ○ CM2: Between arcs in DS and arcs governed by constituents ○ ○ ○ ○ ○○ ○○ ○○ ○○ CM3: Between arcs governed by different constituents ○○ ○ ○
31 Toshiba Confidential TOSHIBA OF EUROPE LTD. 31 Generation of Phrase Structure Forest and Initial Dependency Forest Chart Agenda φ Inactive Edges Active edges : : : Phrase Structure Forest Initial Dependency Graph a set of arcs in the PS forest Initial Co-occurrence Matrix CM1 ~ 3:CMatrix setting condition Initial Dependency Forest 178 np [1,fly,v] [0,time,n] [0,time,v][1,fly,n] [2,like,p] [2,like,v] [3,an,det] [4,arrow,n] 123 np 103 np np 184 vp 188 pp 197 np 189 vp 201 vp 195 vp 191 s 186 s 196 s 186 root
32 Toshiba Confidential TOSHIBA OF EUROPE LTD. 32 Reduction of the Initial Dependency Forest npp 19 vpp 18 sub 24 sub 23 obj 4 nc 2 obj 25 0,time/n1,fly/v 0,time/v1,fly/n Equivalent arc Generated from two grammar rules vp/V → v/V,np/NP : [arc(obj,NP,V)] vp/V → v/V,np/NP,pp/PP : [arc(obj,NP,V), arc(vpp,PP,V)] npp 19 vpp 18 sub 24 sub 23 obj 4 nc 2 0,time/n1,fly/v 0,time/v1,fly/n Reduction more than one equivalent arc is merged into one arc without increasing the number of the generalized dependency trees in the dependency forests
33 Toshiba Confidential TOSHIBA OF EUROPE LTD. 33 Completeness and Soundness of the Dependency Forest Completeness : All phrase structure trees in the parse forest have corresponding dependency trees in the dependency forest. ∀ PT:phrase structure tree ∃ DT:dependency tree dep_tree(PT) = DT Soundness :Every phrase structure tree corresponding to a dependency tree in the dependency forest exists in the phrase structure forest ∀ DT:dependency tree ∃ PT:phrase structure tree dep_tree(PT) = DT × × × × × ○ ○ ○ ○ ○ × DT : dependency tree PT : phrase structure tree Phrase structure forest Dependency forest ○ ○ 1:N correspondence in general The completeness and soundness of the dependency forest is assured [Hirakawa 06]
34 Toshiba Confidential TOSHIBA OF EUROPE LTD. 34 Evaluation of the Dependency Forest Framework Analysis of prototypical ambiguous sentences 1 to N / N to 1 correspondence between phrase structure tree/trees and dependency trees/tree Generation of Non-projective dependency tree
35 Toshiba Confidential TOSHIBA OF EUROPE LTD. 35 =========== s/Sentence =========== (R1) s/VP→ np/NP,vp/VP : [arc(sub,NP,VP)] % Declarative sentence (R2) s/VP→ vp/VP : [] % Imperative sentence ========= np/Noun Phrase ======== (R3) np/N→ n/N : [] % Single noun (R4) np/N2→ n/N1,n/N2 : [arc(nc,N1,N2)] % Compound noun (R5) np/N→ det/DET,n/N : [arc(det,DET,N)] % (R6) np/NP→ np/NP,pp/PP : [arc(npp,PP,NP)] % Prepositional phrase attachment (R7) np/N→ ving/V,n/N : [arc(adjs,V,N)] % Adjectival usage ( subject ) (R8) np/N→ ving/V,n/N : [arc(adjo,V,N)] % Adjectival usage ( object ) (R9) np/V→ ving/V,np/NP : [arc(obj,NP,V)] % Gerund phrase (R10) np/V→ ving/V,np/NP,pp/PP : [arc(obj,NP,V),arc(vpp,PP,V)] % Gerand phrase with PP (R11) np/NP→ np/NP0,and/AND,np/NP: [arc(and,NP0,NP),arc(cnj,AND,NP0)]% Coordination (and) (R12) np/NP→ np/NP0,or/OR,np/NP : [arc(or,NP0,NP),arc(cnj,OR,NP0)] % Coordination (or) ========= vp/Verb ======== phrase (R13) vp/V→ v/V : [] % Intransitive verb (R14) vp/V→ v/V,np/NP : [arc(obj,NP,V)] % Transitive verb (R15) vp/V→ be/BE,ving/V,np/NP : [arc(obj,NP,V),arc(prg,BE,V)] % Progressive (R16) vp/BE→ be/BE,np/NP : [arc(dsc,NP,BE)] % Copular (R17) vp/VP→ vp/VP,pp/PP : [arc(vpp,PP,VP)] % PP-attachment (R18) vp/VP→ adv/ADV,vp/VP : [arc(adv,ADV,VP)] % Adverb modification (R19) vp/V→ v/V,np/NP,adv/ADV,relc/RELP % non-projective pattern :[arc(obj,NP,V),arc(adv,ADV,V),arc(rel,RELP,NP)] ======== pp/Prepositional phrase ======== (R20) pp/P→ pre/P,np/NP :[arc(pre,NP,P)] Grammar rules for typical ambiguities (PP-attachment, Coordination, be-verb usage) Grammar for Ambiguous Sentences
36 Toshiba Confidential TOSHIBA OF EUROPE LTD. 36 PP-attachment Ambiguity Input sentence: I saw a girl with a telescope in the forest. Five well-formed dependency trees 0,I1,saw2,a root 4,with6,telescope8,the9,forest3,girl5,a7,in det 4,0 det 11,0 det42,0 sub 33,20 obj 6,20 vpp 16,15 vpp 27,5 npp 14,10 pre 12,10 pre 24,10 npp 29,5 npp 26,5 root 23,0 Node 0,I: [i]-n-0 1,saw: [saw]-v-1 2,a: [a]-det-2 3,girl: [girl]-n-3 4,with: [with]-pre-4 5,a: [a]-det-5 6,telescope: [telescope]-n-6 7,in: [in]-pre-7 8,the: [the]-det-8 9,forest: [forest]-n-9 root: [root]-x-root Crossing Single role
37 Toshiba Confidential TOSHIBA OF EUROPE LTD. 37 Coordination Scope Ambiguity Input sentence : Earth and Moon or Jupiter and Ganymede. Node 0,earth: [earth]-n-0 1,and: [and]-and-1 2,moon: [moon]-n-2 3,or: [or]-or-3 4,jupiter: [jupiter]-n-4 5,and: [and]-and-5 6,ganymede: [ganymede]-n-6 root: [root]-x-root 0,earth1,and2,moon3,or4,jupitor root 5,and6,ganymede and 12,10 and 25,20 cnj 2,0 or 9,4 cnj 6,0 or 22,3 cnj 14,0 and 18,12 root 26,0 and 14,5 Crossing Single role Five well-formed dependency trees
38 Toshiba Confidential TOSHIBA OF EUROPE LTD. 38 Structural Interpretation Ambiguity and PP-attachment Ambiguity Input sentence: My hobby is watching birds with telescope Ten well-formed dependency trees 0,my1,hobby2,is3,watching4,birds root 5,with6,telescope sub 35,1 sub 38,10 prg 2,10 adj 4,12 dsc 33,8 dsc 36,10 obj 6,15 sub 5,5 npp 23,5 npp 27,3 vpp 24,7 root 44,0 det 1,0 pre 22,0 root 41,0 Node 0,my: [my]-det-0 1,hobby: [hobby]-n-1 2,is: [is]-be-2 3,watching: [watching]-ving-3 4,birds: [birds]-n-4 5,with: [with]-pre-5 6,telescope: [telescope]-n-6 root: [root]-x-root
39 Toshiba Confidential TOSHIBA OF EUROPE LTD. 39 N to 1 Correspondence from PSTs to One DT ( 1 ) Spurious ambiguity (Eisner96),(Noro05) (R17)vp/VP→ vp/VP,pp/PP : [arc(vpp,PP,VP)] % PP-attachment (R18)vp/VP→ adv/ADV,vp/VP : [arc(adv,ADV,VP)] % Adverb modification in the forestsaw a catShecuriously vpppnpadv vp s in the forestsaw a catShecuriously vpppnpadv vp s Rule application: R17 → R18 Rule application: R18 → R17 sawShein the foresta cat curiously adv vpp
40 Toshiba Confidential TOSHIBA OF EUROPE LTD. 40 Modification scope problem (Mel'uk88) Dependency structure has ambiguities in modification scope when it has a head word which has dependants located at the right-hand side and the left-hand side of the head word. ex. Earth and Jupiter in Solar System. 0,Earth1,and2,Jupiter3,in4,Solar System root and4,20 npp8,0pre7,0 cnj2,0 root12,0 ・ Introduction of “Grouping” ( Coordination and operator words (ex. not, only) ) [Mel'uk88] ・ Japanese has no modification scope problem because it has no right to left dependency. Jupiter np in Solar System pp Earth np and cnj np Jupiter np in Solar System pp Earth np and cnj np N to 1 Correspondence from PSTs to One DT ( 2 )
41 Toshiba Confidential TOSHIBA OF EUROPE LTD. 41 Generation of Non-projective Dependency Tree Grammar rule for non-projective dependency tree (R19)vp/V → v/V,np/NP,adv/ADV,relc/REL :[arc(obj,NP,V),arc(adv,ADV,V),arc(rel,RELP,NP)] 1,saw2,the root 5,which was Persian 0,She4,curiously3,cat det 4,0 sub12,20 obj6,20 adv10,15 root 14,0 re 1 11,10 Input sentence : She saw the cat curiously which was Persian *1 *1: Artificial example for showing the rule applicability
42 Toshiba Confidential TOSHIBA OF EUROPE LTD. 42 Conclusion Dependency forest is a packed shared data structure - Bridge between phrase structure and dependency structure usable for Multilevel Packed Shared Data Connection MODEL of PDG - High flexibility in describing constraints Future work Extension of the framework for the modification scope problem (Grouping) Real-world system implementation