Statistical modelling of MT output corpora for Information Extraction.

Slides:

Advertisements

Similar presentations

Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.

Advertisements

KeTra.

Improving Machine Translation Quality with Automatic Named Entity Recognition Bogdan Babych Centre for Translation Studies University of Leeds, UK Department.

Predicting MT Fluency from IE Precision and Recall Tony Hartley, Brighton, UK Martin Rajman, EPFL, CH.

Rating Evaluation Methods through Correlation presented by Lena Marg, Language Tools MTE 2014, Workshop on Automatic and Manual Metrics for Operational.

MT Evaluation The DARPA measures and MT Proficiency Scale.

Re-evaluating Bleu Alison Alvarez Machine Translation Seminar February 16, 2006.

MEANT: semi-automatic metric for evaluating for MT evaluation via semantic frames an asembling of ACL11,IJCAI11,SSST11 Chi-kiu Lo & Dekai Wu Presented.

MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.

NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.

Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.

Person Name Disambiguation by Bootstrapping Presenter: Lijie Zhang Advisor: Weining Zhang.

Correlation of Translation Phenomena and Fidelity Measures John White, Monika Forner.

BLEU, Its Variants & Its Critics Arthur Chan Prepared for Advanced MT Seminar.

Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.

Orange: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin & Franz Josef Och (presented by Bilmes) or Orange: a.

Jumping Off Points Ideas of possible tasks Examples of possible tasks Categories of possible tasks.

Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.

Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University

Evaluating an MT French / English System Widad Mustafa El Hadi Ismaïl Timimi Université de Lille III Marianne Dabbadie LexiQuest - Paris.

Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.

1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.

A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.

Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.

Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.

Finding parallel texts on the web using cross-language information retrieval Achim Ruopp Joint work with Fei Xia University of Washington.

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.

Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,

English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.

1 Statistical NLP: Lecture 10 Lexical Acquisition.

Approximate Randomization tests February 5 th, 2013.

Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate 1 Xiaoqiang Luo 2 Siddharth Patwardhan 2 Martin Franz 2 Radu Florian 2.

Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka

METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.

Statistical Estimation of Word Acquisition with Application to Readability Prediction Proceedings of the 2009 Conference on Empirical Methods in Natural.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

How do Humans Evaluate Machine Translation? Francisco Guzmán, Ahmed Abdelali, Irina Temnikova, Hassan Sajjad, Stephan Vogel.

Sensitivity of automated MT evaluation metrics on higher quality MT output Bogdan Babych, Anthony Hartley Centre for Translation.

Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

GUIDE : PROF. PUSHPAK BHATTACHARYYA Bilingual Terminology Mining BY: MUNISH MINIA (07D05016) PRIYANK SHARMA (07D05017)

A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.

Korea Maritime and Ocean University NLP Jung Tae LEE

1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.

Elena Tarasheva, PhD New Bulgarian University. Conclusions at last year’s BETA conference.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Mutual bilingual terminology extraction Le An Ha*, Gabriela Fernandez**, Ruslan Mitkov*, Gloria Corpas*** * University of Wolverhampton ** Universidad.

Information Transfer through Online Summarizing and Translation Technology Sanja Seljan*, Ksenija Klasnić**, Mara Stojanac*, Barbara Pešorda*, Nives Mikelić.

Corpus-based evaluation of Referring Expression Generation Albert Gatt Ielka van der Sluis Kees van Deemter Department of Computing Science University.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

24 January 2016© P C F de Oliveira Evaluating Summaries Automatically – a system proposal Paulo C. F. de Oliveira, Edson Wilson Torrens, Alexandre.

© 2010 IBM Corporation Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz,

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.

Lecture №4 METHODS OF RESEARCH. Method (Greek. methodos) - way of knowledge, the study of natural phenomena and social life. It is also a set of methods.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

Automatic methods of MT evaluation Lecture 18/03/2009 MODL5003 Principles and applications of machine translation Bogdan Babych.

Analyzing Reliability and Validity in Outcomes Assessment Part 1

Presentation transcript:

Statistical modelling of MT output corpora for Information Extraction

Overview Using MT output for IE –Requirements and evaluation of usability –S-score: measuring the degree of word significance for a text by contrasting text and corpus usages Experiment set-up and MT evaluation metrics using differences in S-scores for MT evaluation Results of MT evaluation for IE –Comparison of MT systems –Correlations with human evaluation measures of MT –Issues of MT architecture and evaluation scores Conclusions & Future work

Using MT for IE Requirements for human use and for automatic processing are different: –fluency is less important than adequacy –stylistic errors are less important than factual errors, e.g.: MT: * Bill Fisher ~ 'to send a bill to a fisher Frequency issues: –low-frequent words carry the most important information (require accurate disambiguation) –Some IE tasks use statistical models (expected to be different for MT)

Frequency issues… disambiguation Examples:

Frequency issues: statistical modelling for IE Research on adaptive IE: automatic template acquisition via statistical means –find sentences containing statistically significant words –build templates around such sentences Template element fillers (e.g., NEs) often appear among statistically significant words Distribution of word frequencies is expected to be different for MT: checking if this is the case

Measuring statistical significance –S word[text] -- the score of statistical significance for a particular word in a particular text; –P word[text] -- the relative frequency of the word in the text; –P word[rest-corp] -- the relative frequency of the same word in the rest of the corpus, without this text; –N word[txt-not-found] -- the proportion of texts in the corpus, where this word is not found (number of texts, where it is not found divided by number of texts in the corpus); –P word[all-corp] -- the relative frequency of the word in the whole corpus, including this particular text

Intuitive appeal of significance scores Selecting words potentially important for IE: In the Marseille Facet of the Urba-Gracco Affair, Messrs. Emmanuelli, Laignel, Pezet, and Sanmarco Confronted by the Former Officials of the SP Research Department On Wednesday, February 9, the presiding judge of the Court of Criminal Appeals of Lyon, Henri Blondet, charged with investigating the Marseille facet of the Urba-Gracco affair, proceeded with an extensive confrontation among several Socialist deputies and former directors of Urba-Gracco. Ten persons, including Henri Emmanuelli and Andre Laignel, former treasurers of the SP, Michel Pezet, and Philippe Sanmarco, former deputies (SP) from the Bouches-du-Rhône, took part in a hearing which lasted more than seven hours

...Intuitive appeal of significance scores Ordering words:

Metric for usability of MT for IE Suggestion: measuring differences in statistical significance for a human translation and MT allows estimating the amount of prospective problems Question: do any human evaluation measures of MT correlate with differences in S-scores for different MT systems?

Experiment setup Available: 100 texts developed for DARPA 94 MT evaluation exercise: French originals 2 different human translations (reference and expert) 5 translations of MT systems ("French into English): –knowledge-based: Systran; Reverso; Metal; Globalink –IBM statistical approach to MT: Candide DARPA evaluation scores available for each system and for human expert translation: –Informativeness; Adequacy; Fluency Calculating distances of combined S-scores between: the human reference translation & other translations (MT and the expert translation)

The distance scores Based on comparing sets of words with S-score > 1 –words significant in both texts with different statistical significance scores –words not present in the reference translation (overgenerated in MT) –words not present in MT, but present in the reference translation (undergenerated in MT) Computing distance scores –o-score for «avoiding overgeneration» (~ Presicion) –u-score for «avoiding undergeneration» (~ Recall) –u&o combined score (calculated as F-measure)

Computing distance scores... Words that changed their significance Overgeneration score: Undergeneration score:

… Computing distance scores Scores for avoiding over- and under-generation Making scores compatible across texts (the number of significant words may be different):

The resulting distance scores

DARPA Adequacy and scores

o-score & DARPA 94 Adequacy

DARPA Fluency and scores

u&o-score and DARPA 94 Fluency

Results and correlation of scores Human expert translation scores higher than MT Statistical MT system «Candide» is characteristically different Strong positive correlation found for: –o-score & DARPA adequacy Weak positive correlation found for –u&o & DARPA fluency No correlation was found between u-score (high for statistical MT) and human MT evaluation measures

Conclusions Word-significance measure S – is useful in other areas (e.g., distinguishing lexical and morphological differences) Threshold S > 1 distinguishes content and functional words across different languages (checked for English, French and Russian) Statistical modelling showed substantial differences between human translation and MT output corpora Measures of contrastive frequencies for words in a particular text and the rest of the corpus correlate with human evaluation of MT (scores for adequacy)

Future work Statistical modelling of Example-based MT Investigating the actual performance of IE systems on different tasks using MT of different quality (with different "usability for IE" scores) and its correlation with proposed MT evaluation measures Establishing formal properties for intuitive judgements about translation quality (translation equivalence, adequacy, and fluency in human translation and MT)