Download presentation

Presentation is loading. Please wait.

Published byPatrick Bissey Modified about 1 year ago

1
R98922004 Yun-Nung Chen 資工碩一 陳縕儂 1 /39

2
Non-projective Dependency Parsing using Spanning Tree Algorithms (HLT/EMNLP 2005) Ryan McDonald, Fernando Pereira, Kiril Ribarov, Jan Hajic 2 /39

3
3 /39

4
Each word depends on exactly one parent Projective Words in linear order, satisfying ▪ Edges without crossing ▪ A word and its descendants form a contiguous substring of the sentence 4 /39

5
English Most projective, some non-projective Languages with more flexible word order Most non-projective ▪ German, Dutch, Czech 5 /39

6
Related work relation extraction machine translation 6 /39

7
Dependency parsing can be formalized as the search for a maximum spanning tree in a directed graph 7 /39

8
8 /39

9
sentence: x = x 1 … x n the directed graph G x = ( V x, E x ) given by dependency tree for x: y the tree G y = ( V y, E y ) V y = V x E y = {(i, j), there’s a dependency from x i to x j } 9 /39

10
scores of edges score of a dependency tree y for sentence x 10 /39

11
11 /39 x = John hit the ball with the bat root hit Johnball the with bat the y1y1 root ball Johnhit the with batthe y2y2 root John ball hit the with batthe y3y3

12
1) How to decide weight vector w 2) How to find the tree with the maximum score 12 /39

13
dependency trees for x = spanning trees for G x the dependency tree with maximum score for x = maximum spanning trees for G x 13 /39

14
14 /39

15
Input: graph G = (V, E) Output: a maximum spanning tree in G greedily select the incoming edge with highest weight ▪ Tree ▪ Cycle in G contract cycle into a single vertex and recalculate edge weights going into and out the cycle 15 /39

16
x = John saw Mary 16 /39 saw root John Mary 9 30 10 20 9 3 30 11 0 GxGx

17
For each word, finding highest scoring incoming edge 17 /39 saw root John Mary 9 30 10 20 9 3 30 11 0 GxGx

18
If the result includes Tree – terminate and output Cycle – contract and recalculate 18 /39 saw root John Mary 9 30 10 20 9 3 30 11 0 GxGx

19
Contract and recalculate ▪ Contract the cycle into a single node ▪ Recalculate edge weights going into and out the cycle 19 /39 saw root John Mary 9 30 10 20 9 3 30 11 0 GxGx

20
Outcoming edges for cycle 20 /39 saw root John Mary 9 30 10 9 3 11 0 GxGx 20 30

21
Incoming edges for cycle, 21 /39 saw root John Mary 9 30 10 9 11 0 GxGx 20 30

22
x = root ▪ s(root, John) – s(a(John), John) + s(C) = 9-30+50=29 ▪ s(root, saw) – s(a(saw), saw) + s(C) = 10-20+50=40 22 /39 saw root John Mary 9 30 10 9 11 0 GxGx 40 29 20 30

23
x = Mary ▪ s(Mary, John) – s(a(John), John) + s(C) = 11-30+50=31 ▪ s(Mary, saw) – s(a(saw), saw) + s(C) = 0-20+50=30 23 /39 saw root John Mary 9 30 11 0 GxGx 31 40 30 20 30

24
24 /39 saw root John Mary 9 30 GxGx Reserving highest tree in cycle Recursive run the algorithm 31 40 20 30

25
25 /39 saw root John Mary 9 30 GxGx Finding incoming edge with highest score Tree: terminate and output 31 40 30

26
26 /39 saw root John Mary 30 GxGx Maximum Spanning Tree of G x 30 40 10

27
Each recursive call takes O(n 2 ) to find highest incoming edge for each word At most O(n) recursive calls (contracting n times) Total: O(n 3 ) Tarjan gives an efficient implementation of the algorithm with O(n 2 ) for dense graphs 27 /39

28
Eisner Algorithm: O(n 3 ) Using bottom-up dynamic programming Maintain the nested structural constraint (non-crossing constraint) 28 /39

29
29 /39

30
Supervised learning Target: training weight vectors w between two features (PoS tag) Training data: Testing data: x 30 /39

31
Margin Infused Relaxed Algorithm (MIRA) dt(x) : the set of possible dependency trees for x 31 /39 keep new vector as close as possible to the old final weight vector is the average of the weight vectors after each iteration

32
Using only the single margin constraint 32 /39

33
Local constraints correct incoming edge for j other incoming edge for j correct spanning tree incorrect spanning trees More restrictive than original constraints 33 /39 a margin of 1 the number of incorrect edges

34
34 /39

35
Language: Czech More flexible word order than English ▪ Non-projective dependency Feature: Czech PoS tag standard PoS, case, gender, tense Ratio of non-projective and projective Less than 2% of total edges are non-projective ▪ Czech-A: entire PDT ▪ Czech-B: including only the 23% of sentences with non- projective dependency 35 /39

36
COLL1999 The projective lexicalized phrase-structure parser N&N2005 The pseudo-projective parser McD2005 The projective parser using Eisner and 5-best MIRA Single-best MIRA Factored MIRA The non-projective parser using Chu-Liu-Edmonds 36 /39

37
Czech-A (23% non-projective) AccuracyComplete 82.8- 80.031.8 83.331.3 84.132.2 84.432.3 37 /39 Czech-B (non-projective) AccuracyComplete -- -- 74.80.0 81.014.9 81.514.3 COLL1999 O(n 5 ) N&N2005 McD2005 O(n 3 ) Single-best MIRA O(n 2 ) Factored MIRA O(n 2 )

38
English AccuracyComplete 90.937.5 90.233.2 90.232.3 38 /39 McD2005 O(n 3 ) Single-best MIRA O(n 2 ) Factored MIRA O(n 2 ) English projective dependency trees Eisner algorithm uses the a priori knowledge that all trees are projective

39
39/39

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google