MT Evaluation The DARPA measures and MT Proficiency Scale.

Slides:



Advertisements
Similar presentations
Questionnaire Development
Advertisements

Mini Presentations: How To
Understanding the English Proficiency Levels of ELLs Catawba County Schools
Assessment types and activities
Statistical modelling of MT output corpora for Information Extraction.
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
The Stocker Probe Technique Beatrice Stocker, 1976
The Test of English for International Communication (TOEIC): necessity, proficiency levels, test score utilization and accuracy. Author: Paul Moritoshi.
Progress Monitoring. Progress Monitoring Steps  Monitor the intervention’s progress as directed by individual student’s RtI plan  Establish a baseline.
$1 Million $500,000 $250,000 $125,000 $64,000 $32,000 $16,000 $8,000 $4,000 $2,000 $1,000 $500 $300 $200 $100 Welcome.
A Tale of Two Tests STANAG and CEFR Comparing the Results of side-by-side testing of reading proficiency BILC Conference May 2010 Istanbul, Turkey Dr.
Predicting MT Fluency from IE Precision and Recall Tony Hartley, Brighton, UK Martin Rajman, EPFL, CH.
Testing: Principles and Techniques  Tests are “inappropriate, mysterious, unreal, subjective, and unstructured.”  Certain basic questions need to be.
Language Assessment System (LAS) Links TM Census Test.
We’re Moving! Beginning October 1, 2014, all events that are scheduled to occur at the ESC Region 11 location will be held at: 1451 S. Cherry Lane, White.
BOARD ENDS POLICY REVIEW E-2 Reading and Writing Testing Results USD 244 Board of Education March 12, 2001.
The Use of Alternative Dispute Resolution in Bankruptcy Proceedings * *Portions reprinted by permission of JAMS.
MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.
Assessment Centre Workshop Budapest How to Succeed at Assessment Centre Jiri Cermak Tomas Vaclavicek Project is funded by.
The Computerized ACTFL- based Speech Tool (CAST) Dr. Mary Ann Lyman-Hager and Ms. Kirsten Barber San Diego State University Merlot Conference, August 2004.
Correlation of Translation Phenomena and Fidelity Measures John White, Monika Forner.
Usability presented by the OSU Libraries’ u-team.
Jumping Off Points Ideas of possible tasks Examples of possible tasks Categories of possible tasks.
Uses of Language Tests.
Evaluating an MT French / English System Widad Mustafa El Hadi Ismaïl Timimi Université de Lille III Marianne Dabbadie LexiQuest - Paris.
NAACL Workshop on MTE 3 rd in a Series of MTE adventures 3 June, 2001.
Session 6: Writing from Sources Audience: 6-12 ELA & Content Area Teachers.
© 2014 The MITRE Corporation. All rights reserved. Stacey Bailey and Keith Miller On the Value of Machine Translation Adaptation LREC Workshop: Automatic.
BLOOMBERG APTITUDE TEST SECTIONS MULTIPLE CHOICE Q’S HR TEST evaluating finance aptitude and career skills english language assessment test practical.
Raili Hildén University of Helsinki Relating the Finnish School Scale to the CEFR.
Proficiency Approach in Teaching Chinese
Ronniee-Marie Ruggiero Title III Access to Core Coach Stevenson Middle School Presenters : Xavier Contreras, Bertha Melendez, Frank Rodriguez Language.
Click here for USACO solutions! …just kidding. (You still have until the end of today to take it!)
CASL: Target -Method Match Statesville Middle School January 13, 2009.
Presentation Five Using Descriptive, Analytical, and Reflective Writing to Analyze Practice.
Principles in language testing What is a good test?
ELA Common Core Shifts. Shift 1 Balancing Informational & Literary Text.
Language and Content-Area Assessment Chapter 7 Kelly Mitchell PPS 6010 February 3, 2011.
Chap. 2 Principles of Language Assessment
Metrics 2.0 Rick A. Morris, PMP, OPM3, MCITP
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
1 Evaluating Research This lecture ties into chapter 17 of Terre Blanche We know the structure of research Understand designs We know the requirements.
Chapter 16.1 Civil Cases. Types of Civil Lawsuits In civil cases the plaintiff – the party bringing the lawsuit – claims to have suffered a loss and usually.
Evaluation of a Cross-lingual Romanian-English Multi-document Summariser Constantin Orasan and Oana Andreea Chiorean Research Group in Computational Linguistics.
Enhancing multiple intelligences in story reading.
Target -Method Match Selecting The Right Assessment.
Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.
PLC Team Leader Meeting
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
CMSC 345 Fall 2000 Design Issues. Modularity and Abstraction Characteristic of all design methods Components have clearly defined inputs and outputs,
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College Ability, Intelligence, Aptitude and Achievement Testing For Class #12 Copyright.
TEFL METHODOLOGY I COMMUNICATIVE LANGUAGE TEACHING.
Maryland College and Career Readiness Conference Summer 2015.
 A test is said to be valid if it measures accurately what it is supposed to measure and nothing else.  For Example; “Is photography an art or a science?
Chapter 6 - Standardized Measurement and Assessment
Applied Opinion Research Training Workshop Day 3.
Chapter 5 Informal Assessment.
UNIVERSAL SCREENING AND PROGRESS MONITORING IN READING Secondary Level.
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Standards-Based Tests A measure of student achievement in which a student’s score is compared to a standard of performance.
New ELA Guidelines Shifts in ELA Common Core  Rise in Nonfiction Texts.  Content Area Literacy Close and careful reading of text  Increase Complexity.
Linguistic Diversity An indisputable feature of the nations history and contemporary life.
EVALUATING EPP-CREATED ASSESSMENTS
Request for Proposal - Best Value
An Overview of Evaluation Research
Request for Proposal - Best Value
Sociology Outcomes Assessment
Developing Listening strategies
Probabilistic Databases
Using ELA Non-Summative Assessments for ELs Teacher Leader Summit 2019 Alice Garcia Special Population Assessment Coordinator.
Presentation transcript:

MT Evaluation The DARPA measures and MT Proficiency Scale

The DARPA Series ( ) The DARPA MT Program – Radical approaches to MT – Heterogeneity w.r.t language, theory, maturity level Evaluation – Must accommodate the heterogeneity, yet – have some basis of measuring progress (and best generic approaches) – seems to necessitate black-box Evolution – toward "core" translation engines – toward higher validity

The DARPA MTE Method 3 Evaluation Approaches – avoids > 1 occurrence of a text – avoids > 1 occurrence of a system – avoids repetition of sequences Adequacy – Determine whether content is conveyed – Subjects respond to fragments on 1-5 scale Fluency – Determine how "English-like" – Subjects respond to sentences on 1-5 scale Informativeness – ability to gather essential information – Subjects answer multiple choice questions

The DARPA Method

Fluency Sample 5 = Excellent 4 = Good 3 = Fair 2 = Poor 1 = Very Poor

French to English Results (1994)

Correlation of measures (DARPA series) High correlation between Adequacy and Informativeness and between Adequacy and Fluency.8860

The MT Proficiency Scale The development of the measure involves four principal steps: what linguistic and non-linguistic translation problems occur in the corpus? what text-handling tasks do users perform with translated material as input? Identifying text-handling tasks Analyzing translation problems Discovering task tolerance order how good must a translation be to be useful for a particular task? Developing source language patterns which patterns correspond to diagnostic target phenomena?

Task-Oriented Exercises TEXT-HANDLING TASKS Publication quality output Gisting Extraction Deep extraction Intermediate extraction Shallow extraction Triage Detection Filtering The analyst’s ability to perform the exercise with each MT sample is scored and reported, using a metric appropriate to the task. For each task in the text- handling task inventory, an exercise is developed that is close to a participating analyst’s task. For each task in the text- handling task inventory, an exercise is developed that is close to a participating analyst’s task.

Persons Organizations Locations Dates Times Money/Percent 2050L To the herding player taking part in the rice five rings to entrust to is the approval after the choumon is a general meeting It is the Nancy kerigan player attack case of the United States of America woman skating, and the tenure herding player that the zenpu plural was arrested is a room with 12th and America Olympic committee (USOC), and it aitedorutte Do to entrust to, and for it drops suit that was causing, it concurred with to cause to appear Do player in rirehanmeru winter season five rings. The Harding player is the expectation that appears to five rings technical program of departure and 23th to Norway in the 15th. The suit, the USOC, is the thing that called for compensation for damages of the 20 million dollars (date 2 billion 170 million yen) in the USOC with the decision suspending of situation that settled five rings appearing suspension of herding player. The opened of American Oregon state Portland city court of justice de, 11th, and oral argument is being being called for a consulting each other settlement with the bar foreign from judge, and both proxy negotiates with, and it reached in agreement. The Patrick Galilee judge that is being in charge of the action is setting "it establishes a choumon of mangaichi and USOC, and the state of affairs that is appearing to be interpreted with unjust to herding player if happens, and it is diminishing five rings team Do player, and power that pays appearance of restoring order, courthouse, does saving". User exercise -- shallow extraction

Task Tolerance Levels

Can Intelligibility predict Fidelity? Zero intelligibility =Zero fidelity Fidelity Intelligibility Authored in the target language Optimal fidelity = Optimal intelligibility Source Language  random dots MT fidelity / intelligibility Human translation fidelity / intelligibility