Presentation is loading. Please wait.

Presentation is loading. Please wait.

Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08.

Similar presentations


Presentation on theme: "Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08."— Presentation transcript:

1 Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08

2 Outline CR: state-of-the-art and our system Distribution of errors Discussion: possible remedies

3 Coreference Resolution „This deal means that Bernard Schwartz can focus most of his time on Globalstar and that is a key plus for Globalstar because Bernard Schwartz is brilliant,“ said Robert Kaimovitz, a satellite communications analyst at Unterberg Harris in New York... Globalstar still needs to raise $ 600 million, and Schwartz said that the company would try..

4 Coreference Resolution „This deal means that Bernard Schwartz can focus most of his time on Globalstar and that is a key plus for Globalstar because Bernard Schwartz is brilliant,“ said Robert Kaimovitz, a satellite communications analyst at Unterberg Harris in New York... Globalstar still needs to raise $ 600 million, and Schwartz said that the company would try..

5 Coreference Resolution „This deal means that Bernard Schwartz can focus most of his time on Globalstar and that is a key plus for Globalstar because Bernard Schwartz is brilliant,“ said Robert Kaimovitz, a satellite communications analyst at Unterberg Harris in New York... Globalstar still needs to raise $ 600 million, and Schwartz said that the company would try..

6 Machine Learning Approaches Soon et al (2000) Cardie & Wagstaff (1999) Strube et al. (2002) Ng & Cardie (2001-2004) ACE competition

7 Features: Soon et al. (2000) 1.Anaphor is a pronoun 2.Anaphor is a definite NP 3.Anaphor is an NP with a demonstrative pronoun („this“,..) 4.Antecedent is a pronoun 5.Both markables are proper names 6.Number agreement 7.Gender agreement 8.Alias 9.Appositive 10.Same surface form 11.Semantic class agreement 12.Distance in sentences

8 Features: other approaches Cardie & Wagstaff: 11 Features Strube et al.: 17 Features (the same standard features + approximate matching (MED)) Ng & Cardie: 53 Features (no improvement on the extended feature set, better results (F=63.4) with manual feature selection)

9 Performance: Soon et al. Soon et al‘s system: Our reimlementation: C5.0, optimized56.165.560.4 C4.5, not optimized53.572.861.7 Ripper44.674.855.9 SVM50.968.858.5 MaxEnt49.264.155.7

10 Performance: Soon et al. Learning Curve for C5.0

11 Tricky and easy anaphors Cristea et al. (2002): state-of-the-art coreference resolution systems have essentially the same performance level Pronominal anaphora – 80% Full-scale coreference – 60% Hypothesis: tricky vs. easy anaphors

12 Our system Goal: Bridge the gap between the theory and the practice: sophisticated linguistic knowledge + data-driven coreference resolution algorithm

13 New Features Different aspects of CR: Surface similarity (122 features) Syntax (64) Semantic Compatibility (29) Salience (136) (Anaphoricity) More or less sophisticated linguistic theories exist for all these phenomena

14 Evaluation Methodology Standart dataset (MUC-7) Standard learning set-up Compare to Soon et al. (2001)

15 Performance (F) Basic feature setExtended f. set Soon et al., C5.0 60.4N/A C4.561.764.6 SVM58.565.4 Ripper55.957.5 MaxEnt55.759.4

16 Performance Learning Curve, SVM

17 Error analysis Different approaches – same performance: Same errors? „Tricky anaphors“? (Cristea et al., 2002) Extensive error analysis needed!

18 Outline CR: state-of-the-art and our system Distribution of errors Discussion: possible remedies

19 Recall errors Errors% MUC173.6 Markables16635.4 Propagated P316.6 Pronouns7716.4 NE-matching316.6 Syntax398.3 Nominal anaphora10422.2 total469100

20 Recall errors - markables Auxilliary doc parts Tokenization Modifiers Bracketing/labeling

21 Recall errors - markables.. there was no requirement for tether to be manufactured in a contaminant-free enviroment. A mesmerizing set.

22 Recall errors - pronouns 1st pl – reconstructing the group: The retiring Republican chairman of the House Committee on Science want U.S. Businesses to „We need to make it easier for the private sector..“ Walker said 3rd sg, 3rd pl – (non-)salience: [The explanation] for the History Channel‘s success begin with its association with another channel owned by the same parent consortium.

23 Recall errors - nominal Mostly common noun phrases with different heads, WordNet does not help much.. a report on the satellites‘ findings the abilities of U.S. Reconnaissance technology the use of advanced intelligence-gathering tools Remote-sensing instruments..

24 Precision errors Errors% MUC307.4 Markables7618.6 Pronouns7819.1 NE-matching204.9 Syntax225.4 Nominal anaphora18244.6 total408100

25 Precision errors- pronouns incorrect Parsing/Tagging Two key vice presidents, [Wei Yen] and Eric Carlson, are leaving to start their own Silicon Valley companies. (non-)salience matching (propagated R)

26 Precision errors - nominal Mostly same-head descriptions. Possible solutions: modifiers? anaphoricicty detectors?

27 P errors – nominal - modifiers Idea: „red car“ cannot corefer with „blue car“ Problem: list of mutually incompatible properties? MUC7 test data: incompatible modifiers30 „new“ mod for anaphora15 compatible modifiers58 no modifiers 62

28 P errors – nominal - dnew Idea: identify and discard unlikely anaphors Problem: even a very good detector does not help

29 Outline CR: state-of-the-art and our system Distribution of errors Discussion: Possible remedies

30 Discussion – Errors Problematic areas: Data Preprocessing modules Features Resolution strategy

31 Discussion - Data bigger corpus more uniform doc selection, text only better definition of COREF better scoring

32 Discussion - Preprocessing local improvements (e.g. appositions) probabilistic architecture to neutralize errors

33 Discussion - Features feature selection ensemble learning more targeted learning for under- represented phenomena (abbreviations)

34 Discussion - Resolution less local: move to the chains level less uniform: specific treatment for different types of anaphors

35 Discussion – Conclusion ML approaches to the Coreference Resolution yield similar performance values Some anaphors are indeed tricky (esp. crucial for precision errors) But some errors can be eliminated within a ML framework –improving the training material –elaborated integration of preprocessing modules –more global resolution strategies

36 Thank You!

37 Recall errors Errors% MUC173.6 Markables16635.4 Propagated P316.6 Pronouns7716.4 NE-matching316.6 Syntax398.3 Nominal anaphora10422.2 total469100

38 Recall errors - MUC Mainly incorrect bracketing..said Jim Johannesen, vice president of site development for McDonald‘s.. Only clear typos etc considered MUC- errors

39 Recall errors – propagated P The company also said the Marine Corps has begun testing two of [its radars] as part of a short-range ballistic missile defense program. That testing could lead to an order for the radars. Crucial for pronouns and indicators for intrasentential coreference

40 Recall errors - matching Mostly ORGANIZATIONs. Problems: Abbreviations Federal Communication Commission FCC Hyphenated names Ziff-Davis Publishing Ziff Foreign names Taiwan President Lee Teng-hui President Lee

41 Recall errors - syntax Apposition, copula Problems: Parsing mistakes Missing constructions..the venture will become synonymous with JSkyB P/R trade-off..Kevlar, a synthetic fiber, and Nomex.. Quantitative constructions.. More than quadruple the three-month daily average of 88,700 shares

42 Precision errors Errors% MUC307.4 Markables7618.6 Pronouns7819.1 NE-matching204.9 Syntax225.4 Nominal anaphora18244.6 total408100

43 Precision errors - matching Finer NE analysis could help, but mostly too difficult even for humans: Loral Loral Space and Communications Corp Loral Space Space Systems Loral

44 Anaphoricity Some markables are not anaphors. We can tell that by looking at them, without any sophisticated coreference resolution. Poesio & Vieira, Ng & Cardie – try to identify Discourse New entities automatically Not used for this talk

45 Anaphoricity Some markables are not anaphors. We can tell that by looking at them, without any sophisticated coreference resolution. Poesio & Vieira, Ng & Cardie – try to identify Discourse New entities automatically Not used for this talk


Download ppt "Error Analysis for Learning-based Coreference Resolution Olga Uryupina 27.05.08."

Similar presentations


Ads by Google