Statistical vs. Neural Machine Translation: a Comparison of MTH and DeepL at Swiss Post’s Language service Lise Volkart – Pierrette Bouillon – Sabrina.

Slides:



Advertisements
Similar presentations
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Advertisements

Problems for Statistical MT Preprocessing Language modeling Translation modeling Decoding Parameter optimization Evaluation.
Translation tools Cyberworld June 2014 Sandrine Ammann Marketing & Communications Officer.
Rating Evaluation Methods through Correlation presented by Lena Marg, Language Tools MTE 2014, Workshop on Automatic and Manual Metrics for Operational.
MEANT: semi-automatic metric for evaluating for MT evaluation via semantic frames an asembling of ACL11,IJCAI11,SSST11 Chi-kiu Lo & Dekai Wu Presented.
Evaluation State-of the-art and future actions Bente Maegaard CST, University of Copenhagen
MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
June 2004 D ARPA TIDES MT Workshop Measuring Confidence Intervals for MT Evaluation Metrics Ying Zhang Stephan Vogel Language Technologies Institute Carnegie.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
Orange: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin & Franz Josef Och (presented by Bilmes) or Orange: a.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Bieber et al., NJIT © Using PLA to Liberate Learning (PLA: participatory learning approach) Michael Bieber, Jia Shen, Dezhi Wu, Vikas Achhpiliya.
© 2014 The MITRE Corporation. All rights reserved. Stacey Bailey and Keith Miller On the Value of Machine Translation Adaptation LREC Workshop: Automatic.
METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.
How do Humans Evaluate Machine Translation? Francisco Guzmán, Ahmed Abdelali, Irina Temnikova, Hassan Sajjad, Stephan Vogel.
Sensitivity of automated MT evaluation metrics on higher quality MT output Bogdan Babych, Anthony Hartley Centre for Translation.
Automatic Post-editing (pilot) Task Rajen Chatterjee, Matteo Negri and Marco Turchi Fondazione Bruno Kessler [ chatterjee | negri | turchi
ASSESSMENT OF INTERNAL BRUISE VOLUME OF SELECTED FRUITS USING MR IMAGING Ta-Te Lin, Yu-Che Cheng, Jen-Fang Yu Department of Bio-Industrial Mechatronics.
Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Korea Maritime and Ocean University NLP Jung Tae LEE
Modern MT Systems and the Myth of Human Translation: Real World Status Quo ● Intro ● MT & HT Definitions ● Comparison MT vs. HT ● Evaluation Methods ●
Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics Projecte de Tesi Elisabet Comelles Directores Irene Castellon i Victoria Arranz.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
Semantic Evaluation of Machine Translation Billy Wong, City University of Hong Kong 21 st May 2010.
Evaluating Translation Memory Software Francie Gow MA Translation, University of Ottawa Translator, Translation Bureau, Government of Canada
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
Interpretese vs Translationese
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
Is Neural Machine Translation the New State of the Art?
A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock 
Post-editing: A Research Perspective
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
Neural Machine Translation
Centre for Translation Studies FACULTY OF ARTS
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
How to teach translation technologies
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Statistical Machine Translation
Japan Science and Technology Agency
Towards Emotion Prediction in Spoken Tutoring Dialogues
Neural Machine Translation by Jointly Learning to Align and Translate
Objectives and workflow
KantanNeural™ LQR Experiment
Joint Training for Pivot-based Neural Machine Translation
Triangular Architecture for Rare Language Translation
Yuri Pettinicchi Jeny Tony Philip
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
ONEs - OHT NMT Evaluation score
The Big Health Data–Intelligent Machine Paradox
Memory-augmented Chinese-Uyghur Neural Machine Translation
Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B
Translating and the Computer 40
Machine Translation(MT)
MT and Post-editing from a Translator’s Perspective
Potential impact of QT21 Eleanor Cornelius
Kyoto University Participation to WAT 2016
University of Illinois System in HOO Text Correction Shared Task
Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab
How to increase productivity without loss of quality
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Neural Machine Translation using CNN
Neural Machine Translation
CS249: Neural Language Model
Faithful Multimodal Explanation for VQA
Presentation transcript:

Statistical vs. Neural Machine Translation: a Comparison of MTH and DeepL at Swiss Post’s Language service Lise Volkart – Pierrette Bouillon – Sabrina Girletti University of Geneva – Translation Technology Department (TIM) lise.volkart ¦ pierrette.bouillon ¦ sabrina.girletti@unige.ch Asling, Translation and the Computer 40 London, 15-16 November 2018

Introduction Context Research questions Microsoft Translator Hub (trained with 288,211 segments and 76 terms from Swiss Post Data) DeepL (generic neural machine translation system) German-to-French Test set: Swiss Post’s annual report Research questions Can a generic neural system compete with a customised statistical MT system? Is BLEU a suitable metric for the evaluation of NMT?

Comparison of MTH and DeepL 3 types of evaluation Automatic evaluation (BLEU) Human evaluation I: post-editing productivity test Human evaluation II: comparative evaluation of the post-edited output

Automatic evaluation Results Corpus of 1,718 segments Very similar scores, BLEU is slightly better for DeepL. System BLEU DeepL 25.23 MTH 23.46

Human evaluation I Post-editing productivity test 2 participants (in-house translator and freelance) 250 segments Full post-editing Time and HTER (Human-Targeted Error Rate)

Human evaluation I (continued)

Human evaluation I (continued) Results (continued) Post-editing: 53.6% faster for DeepL HTER: 75.1% lower for DeepL

Human evaluation II Comparative evaluation of the post-edited output Goal: to ensure that a lower PE time and lower HTER ≠ lower final quality 3 evaluators (MA translation students) Post-edited output from MTH vs. DeepL

Human evaluation II (continued) Results

BLEU score’s reliability for NMT evaluation Motivations Low correlation between automatic and human evaluations Previous studies  BLEU tends to underestimate the quality of NMT Methodology Calculating the underestimation rate (Shterionov et al., 2017) Number of segments that are better according to human but have lower BLEU, divided by the number of segments better according to human

BLEU score’s reliability for NMT evaluation Results

Summary of the results DeepL obtains a slightly better BLEU than MTH DeepL’s output requires less PE effort Final quality seems to be better while using DeepL BLEU seems to underestimate the quality of DeepL’s output