# R98922004 Yun-Nung Chen 資工碩一 陳縕儂 1 /39.  Non-projective Dependency Parsing using Spanning Tree Algorithms (HLT/EMNLP 2005)  Ryan McDonald, Fernando.

## Presentation on theme: "R98922004 Yun-Nung Chen 資工碩一 陳縕儂 1 /39.  Non-projective Dependency Parsing using Spanning Tree Algorithms (HLT/EMNLP 2005)  Ryan McDonald, Fernando."— Presentation transcript:

R98922004 Yun-Nung Chen 資工碩一 陳縕儂 1 /39

 Non-projective Dependency Parsing using Spanning Tree Algorithms (HLT/EMNLP 2005)  Ryan McDonald, Fernando Pereira, Kiril Ribarov, Jan Hajic 2 /39

3 /39

 Each word depends on exactly one parent  Projective  Words in linear order, satisfying ▪ Edges without crossing ▪ A word and its descendants form a contiguous substring of the sentence 4 /39

 English  Most projective, some non-projective  Languages with more flexible word order  Most non-projective ▪ German, Dutch, Czech 5 /39

 Related work  relation extraction  machine translation 6 /39

 Dependency parsing can be formalized as  the search for a maximum spanning tree in a directed graph 7 /39

8 /39

 sentence: x = x 1 … x n  the directed graph G x = ( V x, E x ) given by  dependency tree for x: y  the tree G y = ( V y, E y ) V y = V x E y = {(i, j), there’s a dependency from x i to x j } 9 /39

 scores of edges  score of a dependency tree y for sentence x 10 /39

11 /39  x = John hit the ball with the bat root hit Johnball the with bat the y1y1 root ball Johnhit the with batthe y2y2 root John ball hit the with batthe y3y3

1) How to decide weight vector w 2) How to find the tree with the maximum score 12 /39

 dependency trees for x = spanning trees for G x  the dependency tree with maximum score for x = maximum spanning trees for G x 13 /39

14 /39

 Input: graph G = (V, E)  Output: a maximum spanning tree in G  greedily select the incoming edge with highest weight ▪ Tree ▪ Cycle in G  contract cycle into a single vertex and recalculate edge weights going into and out the cycle 15 /39

 x = John saw Mary 16 /39 saw root John Mary 9 30 10 20 9 3 30 11 0 GxGx

 For each word, finding highest scoring incoming edge 17 /39 saw root John Mary 9 30 10 20 9 3 30 11 0 GxGx

 If the result includes  Tree – terminate and output  Cycle – contract and recalculate 18 /39 saw root John Mary 9 30 10 20 9 3 30 11 0 GxGx

 Contract and recalculate ▪ Contract the cycle into a single node ▪ Recalculate edge weights going into and out the cycle 19 /39 saw root John Mary 9 30 10 20 9 3 30 11 0 GxGx

 Outcoming edges for cycle 20 /39 saw root John Mary 9 30 10 9 3 11 0 GxGx 20 30

 Incoming edges for cycle, 21 /39 saw root John Mary 9 30 10 9 11 0 GxGx 20 30

 x = root ▪ s(root, John) – s(a(John), John) + s(C) = 9-30+50=29 ▪ s(root, saw) – s(a(saw), saw) + s(C) = 10-20+50=40 22 /39 saw root John Mary 9 30 10 9 11 0 GxGx 40 29 20 30

 x = Mary ▪ s(Mary, John) – s(a(John), John) + s(C) = 11-30+50=31 ▪ s(Mary, saw) – s(a(saw), saw) + s(C) = 0-20+50=30 23 /39 saw root John Mary 9 30 11 0 GxGx 31 40 30 20 30

24 /39 saw root John Mary 9 30 GxGx  Reserving highest tree in cycle  Recursive run the algorithm 31 40 20 30

25 /39 saw root John Mary 9 30 GxGx  Finding incoming edge with highest score  Tree: terminate and output 31 40 30

26 /39 saw root John Mary 30 GxGx  Maximum Spanning Tree of G x 30 40 10

 Each recursive call takes O(n 2 ) to find highest incoming edge for each word  At most O(n) recursive calls (contracting n times)  Total: O(n 3 )  Tarjan gives an efficient implementation of the algorithm with O(n 2 ) for dense graphs 27 /39

 Eisner Algorithm: O(n 3 )  Using bottom-up dynamic programming  Maintain the nested structural constraint (non-crossing constraint) 28 /39

29 /39

 Supervised learning  Target: training weight vectors w between two features (PoS tag)  Training data:  Testing data: x 30 /39

 Margin Infused Relaxed Algorithm (MIRA)  dt(x) : the set of possible dependency trees for x 31 /39 keep new vector as close as possible to the old final weight vector is the average of the weight vectors after each iteration

 Using only the single margin constraint 32 /39

 Local constraints  correct incoming edge for j other incoming edge for j  correct spanning tree incorrect spanning trees  More restrictive than original constraints 33 /39  a margin of 1  the number of incorrect edges

34 /39

 Language: Czech  More flexible word order than English ▪ Non-projective dependency  Feature: Czech PoS tag  standard PoS, case, gender, tense  Ratio of non-projective and projective  Less than 2% of total edges are non-projective ▪ Czech-A: entire PDT ▪ Czech-B: including only the 23% of sentences with non- projective dependency 35 /39

 COLL1999  The projective lexicalized phrase-structure parser  N&N2005  The pseudo-projective parser  McD2005  The projective parser using Eisner and 5-best MIRA  Single-best MIRA  Factored MIRA  The non-projective parser using Chu-Liu-Edmonds 36 /39

Czech-A (23% non-projective) AccuracyComplete 82.8- 80.031.8 83.331.3 84.132.2 84.432.3 37 /39 Czech-B (non-projective) AccuracyComplete -- -- 74.80.0 81.014.9 81.514.3 COLL1999 O(n 5 ) N&N2005 McD2005 O(n 3 ) Single-best MIRA O(n 2 ) Factored MIRA O(n 2 )

English AccuracyComplete 90.937.5 90.233.2 90.232.3 38 /39 McD2005 O(n 3 ) Single-best MIRA O(n 2 ) Factored MIRA O(n 2 )  English projective dependency trees  Eisner algorithm uses the a priori knowledge that all trees are projective

39/39

Download ppt "R98922004 Yun-Nung Chen 資工碩一 陳縕儂 1 /39.  Non-projective Dependency Parsing using Spanning Tree Algorithms (HLT/EMNLP 2005)  Ryan McDonald, Fernando."

Similar presentations