Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Slides:

Advertisements

Similar presentations

© 2000 XTRA Translation Services Is MT technology available today ready to replace human translators?

Advertisements

Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.

Problems for Statistical MT Preprocessing Language modeling Translation modeling Decoding Parameter optimization Evaluation.

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

MEANT: semi-automatic metric for evaluating for MT evaluation via semantic frames an asembling of ACL11,IJCAI11,SSST11 Chi-kiu Lo & Dekai Wu Presented.

Word Sense Disambiguation for Machine Translation Han-Bin Chen

Evaluation State-of the-art and future actions Bente Maegaard CST, University of Copenhagen

MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.

Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.

Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.

1 Testing Writing Pertemuan 21 Matakuliah: >/ > Tahun: >

1 Lending a Hand: Sign Language Machine Translation Sara Morrissey NCLT Seminar Series 21 st June 2006.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

© 2014 The MITRE Corporation. All rights reserved. Stacey Bailey and Keith Miller On the Value of Machine Translation Adaptation LREC Workshop: Automatic.

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

Evaluation in NLP Zdeněk Žabokrtský. Intro The goal of NLP evaluation is to measure one or more qualities of an algorithm or a system Definition of proper.

The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.

English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.

Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

Language Knowledge Engineering Lab. Kyoto University NTCIR-10 PatentMT, Japan, Jun , 2013 Description of KYOTO EBMT System in PatentMT at NTCIR-10.

David L. Chen Fast Online Lexicon Learning for Grounded Language Acquisition The 50th Annual Meeting of the Association for Computational Linguistics (ACL)

Regression Approaches to Voice Quality Control Based on One-to-Many Eigenvoice Conversion Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, and.

How do Humans Evaluate Machine Translation? Francisco Guzmán, Ahmed Abdelali, Irina Temnikova, Hassan Sajjad, Stephan Vogel.

Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.

Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.

Crawling and Aligning Scholarly Presentations and Documents from the Web By SARAVANAN.S 09/09/2011 Under the guidance of A/P Min-Yen Kan 10/23/

An Evaluation of Many-to-One Voice Conversion Algorithms with Pre-Stored Speaker Data Sets Daisuke Tani, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari.

Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University.

Kesarkar Madhura, Ph.D. Head, Department of Education, SNDTWU, Mumbai.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.

Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.

3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.

Ling573 NLP Systems and Applications May 7, 2013.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.

Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

LREC Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.

Keyword Translation Accuracy and Cross-Lingual Question Answering in Chinese and Japanese Teruko Mitamura Mengqiu Wang Hideki Shima Frank Lin In CMU EACL.

Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.

A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

1 Hitting The Right Paraphrases In Good Time Stanley Kok Dept. of Comp. Sci. & Eng. Univ. of Washington Seattle, USA Chris Brockett NLP Group Microsoft.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Automatic Question Answering Beyond the Factoid Radu Soricut Information Sciences Institute University of Southern California Eric Brill Microsoft Research.

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.

METEOR: Metric for Evaluation of Translation with Explicit Ordering An Improved Automatic Metric for MT Evaluation Alon Lavie Joint work with: Satanjeev.

Semantic Parsing for Question Answering

KantanNeural™ LQR Experiment

Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

Learning to Parse Database Queries Using Inductive Logic Programming

What is the Entrance Exams Task

The Winograd Schema Challenge Hector J. Levesque AAAI, 2011

Presentation transcript:

Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro Mizukami, Graham Neubig, Koichiro Yoshino, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura NAIST, Japan

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Question answering (QA) One of the techniques for information retrieval Input: Question  Output: Answer Information Source Where is the capital of Japan? Tokyo. Retrieval Retrieval Result 2/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST QA using knowledge bases Convert question sentence into a query Low ambiguity Linguistic restriction of knowledge base  Cross-lingual QA is necessary Where is the capital of Japan? Tokyo. Knowledge base Location.City.Tokyo QA system using knowledge base Query Response 3/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Cross-lingual QA (CLQA) Question sentence  (Linguistic difference)  Information source 日本の首都はどこ？東京 Knowledge base Location.City.Tokyo QA system using knowledge base Query Response To create mapping: High cost and not re-usable in other languages 4/22 Any language

Kyoshiro SUGIYAMA, AHC-Lab., NAIST CLQA using machine translation Machine translation (MT) can be used to perform CLQA Easy, low cost and usable in many languages QA accuracy depends on MT quality 日本の首都はどこ？ Where is the capital of Japan? Existing QA system Tokyo Machine Translation 東京 Machine Translation 5/22 Any language

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Purpose of our work To make clear how translation affects QA accuracy Which MT metrics are suitable for the CLQA task?  Creation of QA dataset using various translations systems  Evaluation of the translation quality and QA accuracy What kind of translation results influences QA accuracy?  Case study (manual analysis of the QA results) 6/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST QA system SEMPRE framework [Berant et al., 13] 3 steps of query generation: Alignment Convert entities in the question sentence into “logical forms” Bridging Generate predicates compatible with neighboring predicates Scoring Evaluate candidates using scoring function 7/22 Scoring

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Data set creation 8/22 Training (512 pairs) Dev. (129 pairs) Test (276 pairs) (OR set) Free917 JA set HT set GT set YT set Mo set Tra set Manual translation into Japanese Translation into English

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Translation method Manual Translation (“HT” set): Professional humans Commercial MT systems Google Translate (“GT” set) Yahoo! Translate (“YT” set) Moses (“Mo” set): Phrase-based MT system Travatar (“Tra” set): Tree-to-String based MT system 9/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Experiments Evaluation of translation quality of created data sets Reference is the questions in the OR set QA accuracy evaluation using created data sets Using same model  Investigation of correlation between them 10/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Metrics for evaluation of translation quality 11/22 BLEU+1: Evaluates local n-grams 1-WER: Evaluates whole word order strictly RIBES: Evaluates rank correlation of word order NIST: Evaluates local word order and correctness of infrequent words Acceptability: Human evaluation

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Translation quality 12/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST QA accuracy 13/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Translation quality and QA accuracy 14/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Translation quality and QA accuracy 15/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level analysis 47% questions of OR set are not answered correctly  These questions might be difficult to answer even with the correct translation result  Dividing questions into two groups Correct group (141*5=705 questions): Translated from 141 questions answered correctly in OR set Incorrect group (123*5=615 questions): Translated from remaining 123 questions in OR set 16/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level correlation Metrics BLUE WER RIBES NIST Acceptability /22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level correlation Metrics BLUE WER RIBES NIST Acceptability Very little correlation NIST has the highest correlation  Importance of content words If the reference cannot be answered correctly, the sentences are not suitable, even for negative samples 18/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sample 1 19/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sample 2 20/22 Lack of the question type-word

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sample 3 21/22 All questions were answered correctly though they are grammatically incorrect.

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Conclusion NIST score has the highest correlation NIST is sensitive to the change of content words If reference cannot be answered correctly, there is very little correlation between translation quality and QA accuracy  Answerable references should be used 3 factors which cause change of QA results: content words, question types and syntax 22/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level correlation: BLEU+1 23/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level correlation: 1-WER 24/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level correlation: RIBES 25/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level correlation: NIST 26/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level correlation: Acceptability 27/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sample 2 28/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sample 4 29/22

Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sample 4 30/22 They were wrong answered though they are grammatically correct Correct grammar is not so important