Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics Projecte de Tesi Elisabet Comelles Directores Irene Castellon i Victoria Arranz.

Slides:

Advertisements

Similar presentations

Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.

Advertisements

Design, prototyping and construction

Issues in developing narrative structures Postgraduate writing, seminar 7 John Morgan.

Intracompany Stock Transfer Scenario Overview

Rating Evaluation Methods through Correlation presented by Lena Marg, Language Tools MTE 2014, Workshop on Automatic and Manual Metrics for Operational.

Dr. Ehud Reiter, Computing Science, University of Aberdeen1 NLG Shared Tasks: Lets try it and see what happens Ehud Reiter (Univ of Aberdeen)

MEANT: semi-automatic metric for evaluating for MT evaluation via semantic frames an asembling of ACL11,IJCAI11,SSST11 Chi-kiu Lo & Dekai Wu Presented.

TAP-ET: TRANSLATION ADEQUACY AND PREFERENCE EVALUATION TOOL Mark Przybocki, Kay Peterson, Sébastien Bronsart May LREC 2008 Marrakech, Morocco.

Evaluation State-of the-art and future actions Bente Maegaard CST, University of Copenhagen

MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.

Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.

Screen 1 of 24 Reporting Food Security Information Understanding the User’s Information Needs At the end of this lesson you will be able to: define the.

Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.

Machine Translation Anna Sågvall Hein Mösg F

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.

Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.

BLEU, Its Variants & Its Critics Arthur Chan Prepared for Advanced MT Seminar.

June 2004 D ARPA TIDES MT Workshop Measuring Confidence Intervals for MT Evaluation Metrics Ying Zhang Stephan Vogel Language Technologies Institute Carnegie.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Recent Trends in MT Evaluation: Linguistic Information and Machine Learning Jason Adams Instructors: Alon Lavie Stephan Vogel.

The Use of Corpora for Automatic Evaluation of Grammar Inference Systems Andrew Roberts & Eric Atwell Corpus Linguistics ’03 – 29 th March Computer Vision.

Research Methods for Business Students

Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

1 Lending a Hand: Sign Language Machine Translation Sara Morrissey NCLT Seminar Series 21 st June 2006.

Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.

TimeCleanser: A Visual Analytics Approach for Data Cleansing of Time-Oriented Data Theresia Gschwandtner, Wolfgang Aigner, Silvia Miksch, Johannes Gärtner,

Interviewing Stakeholders: Evaluating Support for Policy Change in Your Community.

© 2014 The MITRE Corporation. All rights reserved. Stacey Bailey and Keith Miller On the Value of Machine Translation Adaptation LREC Workshop: Automatic.

AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

Machine translation Context-based approach Lucia Otoyo.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?

Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka

METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.

Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.

Sensitivity of automated MT evaluation metrics on higher quality MT output Bogdan Babych, Anthony Hartley Centre for Translation.

A daptable A utomatic E valuation M etrics for M achine T ranslation L ucian V lad L ita joint work with A lon L avie and M onica R ogati.

XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.

Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.

Gerrit Schutte OHIM 9th of December, 2011 Trademark terminology control.

Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.

Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.

Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.

Copyright 2006 John Wiley & Sons, Inc. Chapter 1 - Introduction HCI: Designing Effective Organizational Systems Dov Te’eni Jane Carey Ping Zhang.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.

SALSA-WS 09/05 Approximating Textual Entailment with LFG and FrameNet Frames Aljoscha Burchardt, Anette Frank Computational Linguistics Department Saarland.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.

Unit – I Presentation. Unit – 1 (Introduction to Software Project management) Definition:-  Software project management is the art and science of planning.

Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.

Software Development Process CS 360 Lecture 3. Software Process The software process is a structured set of activities required to develop a software.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Semantic Evaluation of Machine Translation Billy Wong, City University of Hong Kong 21 st May 2010.

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

Evaluating Translation Memory Software Francie Gow MA Translation, University of Ottawa Translator, Translation Bureau, Government of Canada

Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,

LAB: Linguistics Annotated Bibliography – A searchable Portal for Normed Database Information Erin M. Buchanan, Kathrene D. Valentine, Marilee L. Teasley,

Unit F451 Computer Fundamentals Components of a Computer System Software Data: Its representation, structure and management in information.

METEOR: Metric for Evaluation of Translation with Explicit Ordering An Improved Automatic Metric for MT Evaluation Alon Lavie Joint work with: Satanjeev.

Approaches to Machine Translation

Chapter 1 - Introduction

Design, prototyping and construction

Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

Approaches to Machine Translation

Statistical vs. Neural Machine Translation: a Comparison of MTH and DeepL at Swiss Post’s Language service Lise Volkart – Pierrette Bouillon – Sabrina.

Design, prototyping and construction

Imagine Obtaining Cost Directly From Limited Requirements

Presentation transcript:

Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics Projecte de Tesi Elisabet Comelles Directores Irene Castellon i Victoria Arranz

Outline Introduction State of the Art Discussion of MT Evaluation Metrics Hypothesis & Objective Methodology & Schedule

Introduction Quickly access to Multilingual Information Need for quick translation High increase of MT Systems Need for evaluation of those MT Systems Evaluation needs to be quick and reliable

Introduction Current and most used Evaluation Metrics show problems New approaches to Evaluation using linguistic information: –Syntactic info –Semantic info Our scenario: –Comparisson between already existing systems –Direction of translation to test: English-Spanish

State of the Art MT absolutely linked to MT Evaluation Purpose of the evaluation methods: –Error analysis –System comparisson Chronologically: 1. Human MT Evaluation 2. Automatic MT Evaluation

State of the Art Types of MT Evaluation Focused on Context: –Context-based Evaluation (FEMTI) Evaluates suitability of the MT Technology & the MT System for the user’s purpose Parameters of analysis: functionality, reliability, usabiility, efficiency, maintainability, portability, cost, etc. Focused on Quantitiy & Quality: –Human Evaluation and Automatic Evaluation

State of the Art Types of MT Evaluation Human Evaluation: –Several approaches: Fidelity (ALPAC report) Intelligibility (ALPAC report) Comprehensive evaluation of informativeness (ARPA) Quality panel evaluation Adequacy and Fluency (Semantics and Syntax) Preferred Translation Required Post-Editing

State of the Art Types of MT Evaluation Human Evaluation: –Advantage: human evaluators can evaluate the overall qualitiy of the system –Disadvantages: Time-consuming Expensive Subjective

State of the Art Types of MT Evaluation Automatic Evaluation: –Approaches: Based on Lexical Matching Based on Syntax Based on Semantics

State of the Art Types of MT Evaluation Based on Lexical Matching: –Dominant approach to Automatic MT Evaluation –Seeks for lexical similarities between MT output and reference translations –Types: Edit Distance Measures (WER) Precision-oriented Measures (BLEU) Recall-oriented Measures (ROUGE) Measure balancing Precision & Recall (GTM)

State of the Art Types of MT Evaluation Based on Syntax –Recently developed –Focused on the syntax of the output sentence –Types: Constituency Parsing Dependency Parsing Combination of both analyses (Liu & Gildea 2005)

State of the Art Types of MT Evaluation Based on Semantics: –Recently developed –Focused on the semantics of the output level –Types: NEs: Quality over NEs (NEE) Semantic Roles: Similarities over Semantic Roles (SR)

Discussion of MT evaluation Metrics Human Evaluation: –Advantatges: Allow to evaluate overall quality –Disadvantatges: Time-consuming Expensive Subjective

Discussion of MT Evaluation Metrics Automatic Evaluation: –Advantages: Fast Not expensive Objective Updatable –Disadvantages?

Discussion of MT Evaluation Metrics Automatic Metrics based on Lexical Matching: –Great advance in MT Research in the last decade –Widely accepted & used by the SMT research community –BLEU is the most used Automatic Metric –Criticized by those not developing SMT systems –Usually depend on translation references –Only take into account lexical similarities & disregard syntax –Biased

Discussion of MT Evaluation Metrics Automatic Metrics based on Syntax: –Good improvement –Works at sentence level –Only focused on Syntax –What about meaning? Automatic metrics based on Semantics: –Good improvement –Only NEs & Semantic Roles –NEs not too relevant –Need further development –Only focused on meaning, what about syntax?

Discussion of MT Evaluation Metrics Discussion of Automatic Metrics: –Each metric focuses on a partial aspect of quality  Strongly biased evaluations  Unfair comparisson between systems  Overtuning of the system −Need for integration of metrics Parametric vs. Non-parametric Evaluation of the quality of a metric combination  Human likeness  Human acceptability

Hypothesis & Objective Hypothesis: Adding new linguistic information will improve the performance of Automatic Metrics Main Objective: Proposing a new Automatic Evaluation Metric based on linguistic information.

Hypothesis & Objective Secondary Objectives: –Explore linguistic information: Syntactic info: POS, shallow parsing, chunking, full parsing, dependency parsing, constituency parsing, etc. Semantic info: Semantic Roles, semantic features, Wordnet, Framenet, Lexical Semantics, etc. –Look for linguistic resources appropriate to be computationally processed –Look for linguistic resources publicly available –Explore the appropriate way to combine this information

Methodology & Schedule 4 stages: –Stage 1 (year 1 & 2): Bibliography research and analysis: –Detailed exploration and analysis of Automatic Evaluation Metrics –Detailed exploration, analysis and selection of the adequate linguistic information. –Exploration of the feasibility and availability of the linguistic resources needed –Stage 2 (year 1 & 2): Selection of the Corpus of evaluation

Methodology & Schedule –Stage 3 (year 3): Experiments on how to combine this linguistic information and the automatic evaluation metrics Evaluation of our metric combination based on either likeness or acceptability. –Stage 4 (year 4): Analysis & discussion of the results obtained Summary of the findings and reflection on the results obtained Proposal of a new evaluation metric