Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.

Similar presentations


Presentation on theme: "Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro."— Presentation transcript:

1 Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro Mizukami, Graham Neubig, Koichiro Yoshino, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura NAIST, Japan

2 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Question answering (QA) One of the techniques for information retrieval Input: Question  Output: Answer Information Source Where is the capital of Japan? Tokyo. Retrieval Retrieval Result 2/22

3 Kyoshiro SUGIYAMA, AHC-Lab., NAIST QA using knowledge bases Convert question sentence into a query Low ambiguity Linguistic restriction of knowledge base  Cross-lingual QA is necessary Where is the capital of Japan? Tokyo. Knowledge base Location.City.Tokyo QA system using knowledge base Query Response 3/22

4 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Cross-lingual QA (CLQA) Question sentence  (Linguistic difference)  Information source 日本の首都は どこ? 東京 Knowledge base Location.City.Tokyo QA system using knowledge base Query Response To create mapping: High cost and not re-usable in other languages 4/22 Any language

5 Kyoshiro SUGIYAMA, AHC-Lab., NAIST CLQA using machine translation Machine translation (MT) can be used to perform CLQA Easy, low cost and usable in many languages QA accuracy depends on MT quality 日本の首都はど こ? Where is the capital of Japan? Existing QA system Tokyo Machine Translation 東京 Machine Translation 5/22 Any language

6 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Purpose of our work To make clear how translation affects QA accuracy Which MT metrics are suitable for the CLQA task?  Creation of QA dataset using various translations systems  Evaluation of the translation quality and QA accuracy What kind of translation results influences QA accuracy?  Case study (manual analysis of the QA results) 6/22

7 Kyoshiro SUGIYAMA, AHC-Lab., NAIST QA system SEMPRE framework [Berant et al., 13] 3 steps of query generation: Alignment Convert entities in the question sentence into “logical forms” Bridging Generate predicates compatible with neighboring predicates Scoring Evaluate candidates using scoring function 7/22 Scoring

8 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Data set creation 8/22 Training (512 pairs) Dev. (129 pairs) Test (276 pairs) (OR set) Free917 JA set HT set GT set YT set Mo set Tra set Manual translation into Japanese Translation into English

9 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Translation method Manual Translation (“HT” set): Professional humans Commercial MT systems Google Translate (“GT” set) Yahoo! Translate (“YT” set) Moses (“Mo” set): Phrase-based MT system Travatar (“Tra” set): Tree-to-String based MT system 9/22

10 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Experiments Evaluation of translation quality of created data sets Reference is the questions in the OR set QA accuracy evaluation using created data sets Using same model  Investigation of correlation between them 10/22

11 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Metrics for evaluation of translation quality 11/22 BLEU+1: Evaluates local n-grams 1-WER: Evaluates whole word order strictly RIBES: Evaluates rank correlation of word order NIST: Evaluates local word order and correctness of infrequent words Acceptability: Human evaluation

12 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Translation quality 12/22

13 Kyoshiro SUGIYAMA, AHC-Lab., NAIST QA accuracy 13/22

14 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Translation quality and QA accuracy 14/22

15 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Translation quality and QA accuracy 15/22

16 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level analysis 47% questions of OR set are not answered correctly  These questions might be difficult to answer even with the correct translation result  Dividing questions into two groups Correct group (141*5=705 questions): Translated from 141 questions answered correctly in OR set Incorrect group (123*5=615 questions): Translated from remaining 123 questions in OR set 16/22

17 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level correlation Metrics BLUE+10.9000.007 1-WER0.6900.092 RIBES0.4180.311 NIST0.9420.210 Acceptability0.8900.547 17/22

18 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level correlation Metrics BLUE+10.9000.007 1-WER0.6900.092 RIBES0.4180.311 NIST0.9420.210 Acceptability0.8900.547 Very little correlation NIST has the highest correlation  Importance of content words If the reference cannot be answered correctly, the sentences are not suitable, even for negative samples 18/22

19 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sample 1 19/22

20 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sample 2 20/22 Lack of the question type-word

21 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sample 3 21/22 All questions were answered correctly though they are grammatically incorrect.

22 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Conclusion NIST score has the highest correlation NIST is sensitive to the change of content words If reference cannot be answered correctly, there is very little correlation between translation quality and QA accuracy  Answerable references should be used 3 factors which cause change of QA results: content words, question types and syntax 22/22

23 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level correlation: BLEU+1 23/22

24 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level correlation: 1-WER 24/22

25 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level correlation: RIBES 25/22

26 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level correlation: NIST 26/22

27 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sentence-level correlation: Acceptability 27/22

28 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sample 2 28/22

29 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sample 4 29/22

30 Kyoshiro SUGIYAMA, AHC-Lab., NAIST Sample 4 30/22 They were wrong answered though they are grammatically correct Correct grammar is not so important


Download ppt "Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro."

Similar presentations


Ads by Google