Download presentation
Presentation is loading. Please wait.
Published byKellie Gordon Modified over 6 years ago
1
Improving Parsing Accuracy by Combining Diverse Dependency Parsers
Daniel Zeman and Zdeněk Žabokrtský ÚFAL MFF, Univerzita Karlova, Praha
2
Overview introduction existing parsers and their accuracies
methods of combination switching unbalanced voting results conclusion Vancouver, Zeman & Žabokrtský
3
Dependency Parsing parse: S 2N S = set of all sentences
N = set of natural numbers Vancouver, Zeman & Žabokrtský
4
Dependency Parsing Píše dopis svému příteli . Píše dopis svému příteli
VB-S---3P-AA-- VeYS------A--- NNIS1-----A--- NNIS4-----A--- P8ZS NNMS3-----A--- NNMS5-----A--- NNMS6-----A--- Z:------ Píše dopis svému příteli . Vancouver, Zeman & Žabokrtský
5
Dependency Parsing letter to his friend . He is writing a letter
VB-S---3P-AA-- VeYS------A--- NNIS1-----A--- NNIS4-----A--- P8ZS NNMS3-----A--- NNMS5-----A--- NNMS6-----A--- Z:------ write letter to his friend . Vancouver, Zeman & Žabokrtský
6
Dependency Parsing 1 4 letter to his friend . He is writing a letter
1 4 write letter to his friend . Vancouver, Zeman & Žabokrtský
7
Prague Dependency Treebank (PDT 1.0)
Czech training tokens in non-empty sentences tune tokens in 3646 sentences test tokens in 3673 sentences accuracy = percentage of tokens with correctly assigned parent nodes (each dep. tree has an artificial root node) Vancouver, Zeman & Žabokrtský
8
Existing Parsers For Czech (PDT; accuracies on Tune set):
83.6 % Eugene Charniak’s (ec) [ported from English] 81.7 % Michael Collins’ (mc) [ported from English] 74.3 % Zdeněk Žabokrtský’s (zz) [hand-made rules] 73.8 % Daniel Zeman’s (dz) [dependency n-grams] Tomáš Holan’s: 71.0 % pshrt 69.5 % left-to-right [push-down automaton] 62.0 % right-to-left [push-down automaton] Vancouver, Zeman & Žabokrtský
9
More Existing Parsers New parsers (2005):
Nivre & Jenssen [push-down automaton] McDonald & Ribarov [max span tree] EC++ (Hall & Novák) No accuracy figures for our Tune set. But they are better than most of our parsers pool. Vancouver, Zeman & Žabokrtský
10
Good Old Truth: Two Heads Are More Than One!
van Halteren et al.: tagging Brill and Wu: tagging Brill and Hladká: bagging parsers Henderson and Brill: constituent parsing Frederking and Nirenburg: machine translation Fiscus: speech recognition Borthwick: named entity recognition Inui and Inui: partial parsing Florian and Yarowsky: word sense disambiguation Chu-Carroll et al.: question answering Vancouver, Zeman & Žabokrtský
11
Voting Question: “What is the index of the parent of the i-th node?”
Answer: ec: “7” mc: “7” zz: “5” dz: “11” Resulting answer: “7” Vancouver, Zeman & Žabokrtský
12
Emerging Issues Are the parsers different enough to contribute uniquely? How do we do if all parsers disagree? What if the resulting structure is not a tree? Vancouver, Zeman & Žabokrtský
13
Uniqueness of a Parser How many parents are there that only parser X found? pool of 7 parsers (ec, mc, zz, dz, thr, thl, thp) test data set ec: 1.7 % zz: 1.2 % (rule-based parser, no statistics, non-projectivities!) mc: 0.9 % others: 0.3 – 0.4 % Vancouver, Zeman & Žabokrtský
14
Uniqueness of a Parser How many parents are there that only parser X found? four best parsers (ec, mc, zz, dz) test data set ec: 3.0 % zz: 2.0 % (rule-based parser, no statistics, non-projectivities!) mc: 1.7 % dz: 1.0 % Vancouver, Zeman & Žabokrtský
15
Uniqueness of a Parser How many parents are there that only parser X found? two best parsers (ec, mc) test data set ec: 8.1 % mc: 6.2 % Vancouver, Zeman & Žabokrtský
16
Uniqueness of a Parser The unique things are hard to push through
The real strength will always be there where parsers agree (voting) Vancouver, Zeman & Žabokrtský
17
Majority vs. Oracle test data ec: 85.0 % ec mc zz dz thr thl thp
Majority: >half of the parsers 76.8 % 75.1 % 82.9 % Oracle: at least one parser 95.8 % 94.0 % 93.0 % Vancouver, Zeman & Žabokrtský
18
Majority Voting Three parsers (ec+mc+zz): 83 % of parents known by at least two parsers (majority) However, ec alone achieves 85 %! For some parents, there is no majority! (ecmczz) In such cases, use ec’s opinion. together 86.7 % Vancouver, Zeman & Žabokrtský
19
Weighting the Parsers We have backed-off to ec.
Why? — It’s the best parser of all! How do we know? We can measure the accuracies on the Tune data set. Can we use the accuracies in a more sophisticated way? Vancouver, Zeman & Žabokrtský
20
Weighting the Parsers A parser has so many votes, how many percent of accuracy it achieves. E.g., mc+zz would outvote ec+thr: = 156 > = Context is not taken into account (so far). Vancouver, Zeman & Žabokrtský
21
Context Hope: one parser is good at PP attachment while another knows how to build coordination (e.g.). Features such as morphology of the dependent node may help to find the right parser. The context sensitivity of the combining classifier was trained on the Tune Data Set. Vancouver, Zeman & Žabokrtský
22
Context Features For each node of: the dependent, and the parents proposed by the respective parsers: part of speech, subcategory, gender, number, case, inner gender, inner number, person, degree of comparison, negativeness, tense, voice, semantic flags (proper name, geography…) For each gov-dep pair: mutual position (left neighbor, right far…) For each parser pair: do the two parsers agree? Vancouver, Zeman & Žabokrtský
23
Decision Trees We have trained C5 (Quinlan)
Very minor improvement (0.1 %) Got quite simple decision trees Mimic voting — parser agreement are the most important features (in fact, this is not context) Did not help with just two parsers (ec+mc) (no voting possible) Vancouver, Zeman & Žabokrtský
24
Example of a Decision Tree
agreezzmc = yes: zz (3041/1058) agreezzmc = no: :...agreemcec = yes: ec (7785/1026) agreemcec = no: :...agreezzec = yes: ec (2840/601) agreezzec = no: :...zz_case = 6: zz (150/54) zz_case = 3: zz (34/10) zz_case = X: zz (37/20) zz_case = undef: ec (2006/1102) zz_case = 7: zz (83/48) zz_case = 2: zz (182/110) zz_case = 4: zz (108/57) zz_case = 1: ec (234/109) zz_case = 5: mc (1) zz_case = root: :...ec_negat = A: mc (117/65) ec_negat = undef: ec (139/65) ec_negat = N: ec (1) ec_negat = root: ec (2) Vancouver, Zeman & Žabokrtský
25
It is not guaranteed that the result is a tree!
1 2 3 1 2 3 1 2 3 1 2 3 Vancouver, Zeman & Žabokrtský
26
Note: We Actually May Be Willing to Accept Non-Trees
The method of computing accuracy motivates to look at nodes, not the whole structure. Suppose that one edge in a cycle is wrong, we do not know which one, all others are good. If we wrongly select the bad one, we get two wrong edges. When partial relations are sought for, the whole structure may not matter. Vancouver, Zeman & Žabokrtský
27
How to Preserve Treeness
In each step (adding a new dependency), rule out parsers whose proposal introduces a cycle. If all parsers propose cycles, abandon the whole structure. Use ec’s tree as is, instead. Vancouver, Zeman & Žabokrtský
28
Results Baseline (ec): 85.0 %
Four parsers (ec+mc+zz+dz), cycles allowed: 87.0 % 91.6 % structures are trees Four parsers (ec+mc+zz+dz), cycles banned: 86.9 % (sorry for the typos in the paper sec. 5.4) Vancouver, Zeman & Žabokrtský
29
Unbalanced Combination
(Brill & Hladká in Hajič et al., 1998) Is precision more important to us than recall? Better say nothing than make a mistake. That may be our priority when: preprocessing text for annotators extracting various phenomena from a corpus (if there is no parse for a sentence, never mind, we just will not extract anything from here) Vancouver, Zeman & Žabokrtský
30
Unbalanced Combination
Include only dependencies proposed by at least half of the parsers. Some nodes won’t get a parent. Results for 7 parsers: precision 90.7 % recall 78.6 % f-measure 84.2 % Vancouver, Zeman & Žabokrtský
31
Unbalanced Combination
Interesting: Unbalanced voting of even number of parsers prefers recall over precision! Sometimes one half of the parsers proposes one parent, while the other half agree on another candidate. Results for 4 best parsers: precision 85.4 % recall 87.7 % f-measure 86.5 % Vancouver, Zeman & Žabokrtský
32
Related Work Brill and Hladká combined several “parsers” — in fact one parser, trained on different bags of training data. 6 % error reduction, cf. with our 13 % Brill and Henderson combined three constituency-based parsers. They did not find context helpful either. no crossing bracket introduction lemma Vancouver, Zeman & Žabokrtský
33
Summary Combination techniques successfully applied to dependency parsing. Keeping treeness is not too expensive (in terms of accuracy). Vancouver, Zeman & Žabokrtský
34
Future Work We are preparing the voting right for new parsers (Nivre/Jenssen, Ribarov/McDonald, Charniak/Hall/Novák) As these parsers are better than most of our current parser pool, we expect the results to improve — provided the new parsers are able to contribute new ideas. Vancouver, Zeman & Žabokrtský
35
Thank you. Vancouver, Zeman & Žabokrtský
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.