Presentation is loading. Please wait.

Presentation is loading. Please wait.

LREC 2010 Malta, May 20, 2010  ELDA 1 Evaluation Protocol and Tools for Question-Answering on Speech Transcripts N. Moreau, O. Hamon, D. Mostefa ELDA/ELRA,

Similar presentations


Presentation on theme: "LREC 2010 Malta, May 20, 2010  ELDA 1 Evaluation Protocol and Tools for Question-Answering on Speech Transcripts N. Moreau, O. Hamon, D. Mostefa ELDA/ELRA,"— Presentation transcript:

1 LREC 2010 Malta, May 20, 2010  ELDA 1 Evaluation Protocol and Tools for Question-Answering on Speech Transcripts N. Moreau, O. Hamon, D. Mostefa ELDA/ELRA, Paris, France S. Rosset, O. Galibert, L. LamelLIMSI, Paris, France J. Turmo, P. R. ComasUPC, Barcelona, Spain P. Rosso, D. BuscaldiUPV, Valencia, Spain Contact: moreau@elda.org

2 LREC 2010 Malta, May 20, 2010  ELDA Outline -What is QAST? -QAST evaluations -Evaluation data and tasks -QASTLE evaluation interface -Overview of main results -Conclusions and perspectives 2

3 LREC 2010 Malta, May 20, 2010  ELDA 3 What is QAST? -QAST stands for Question-Answering on Speech Transcripts -4 QAST evaluation campaigns (in 2006, 2007, 2008, 2009) -Organized by UPC, UPV, LIMSI and ELDA. -Goals: -Development of robust QA for speech -Measure loss due to ASR inaccuracies -Measure loss at different ASR word error rates -Measure loss when using oral spontaneous questions (in 2009)

4 LREC 2010 Malta, May 20, 2010  ELDA QAST evaluations 2006200720082009 CHIL(EN) Manual transc 1 ASR output CHIL (EN) AMI (EN) Manual transc 1 ASR output +Words graph CHIL(EN) AMI (EN) ESTER (FR) EPPS (EN) EPPS (ES) Manual trans 3 ASR outputs @ ESTER (FR) EPPS (EN) EPPS (ES) Manual trans 3 ASR outputs Written questionsOral questions 4

5 LREC 2010 Malta, May 20, 2010  ELDA 5 QAST Data Sets CorpusLanguageDescriptionSpeech TranscriptsWERCampaigns CHILEnglish 25 lectures (~25h) Manual-2006,2007, 2008 ASR20%2006, 2007, 2008 AMIEnglish 168 meetings (~100h) Manual-2007, 2008 ASR38%2007, 2008 ESTERFrench 18 BN shows (~10h) Manual-2008, 2009 ASR11.9%2008, 2009 ASR23.9%2008, 2009 ASR35.4%2008, 2009 EPPSEnglish 6 sessions (~3h) Manual-2008, 2009 ASR10.6%2008, 2009 ASR14.0%2008, 2009 ASR24.1%2008, 2009 EPPSSpanish 6 sessions (~3h) Manual-2008, 2009 ASR11.5%2008, 2009 ASR12.7%2008, 2009 ASR13.7%2008, 2009

6 LREC 2010 Malta, May 20, 2010  ELDA 6 Questions and Evaluation Tasks Different evaluation tasks: –QA in manual transcriptions –QA in automatic transcriptions (ASR) –QA using written questions –QA using transcription of oral questions Question sets created each year for each dataset: –100 questions for training + 50 questions for tests –Question type: Factual + Definitional –New in 2009: spontaneous oral questions

7 LREC 2010 Malta, May 20, 2010  ELDA 7 Creation of oral questions People were presented short text excerpts (taken from the corpus) After reading each excerpt they had to ask a few ‘spontaneous’ questions Oral questions were recorded Oral questions were manually transcribed (including speech disfluencies) A canonical “written” version was created for each question Example:Oral:When did the bombing of Fallujah t() take euh took place? Written:When did the bombing of Fallujah take place?

8 LREC 2010 Malta, May 20, 2010  ELDA 8 Up to 5 ranked answers per question Answers for ‘manual transcriptions’ tasks: Answer_string + Doc_ID Answers for ‘automatic transcriptions’ tasks: Answer_string + Doc_ID + Time_start + Time_end Submissions Time slot of the answer

9 LREC 2010 Malta, May 20, 2010  ELDA Four possible judgments : Correct / Incorrect / Inexact / Unsupported QA on manual transcriptions: Manual assessment with the QASTLE interface QA on automatic (ASR) transcriptions: Automatic assessment (script) + manual check with QASTLE 2 metrics: –Mean Reciprocal Rank (MRR) measures how well right answers are ranked on average –Accuracy fraction of correct answers ranked in the first position Assessments 9

10 LREC 2010 Malta, May 20, 2010  ELDA 10 QASTLE interface

11 LREC 2010 Malta, May 20, 2010  ELDA Automatic script to assess QA on ASR transcriptions The script compares of time slot boundaries of: –Reference time slot (created beforehand) –Hypothesis time slot (submitted answer) The overlap is compared to a predefined threshold: –overlap > threshold => Answer is CORRECT –overlap Answer is INEXACT –no overlap=> Answer is INCORRECT 2nd pass: Manual check with QASTLE Semi-automatic assessments 11

12 LREC 2010 Malta, May 20, 2010  ELDA 12 Best results (Accuracy %) CorpusTranscr.200720082009 Written Q. 2009 Oral Q. CHIL Manual0.510.41-- ASR (20.0%)0.360.31-- AMI Manual0.250.33-- ASR (38.0%)0.210.18-- ESTER Manual-0.450.280.26 ASR (11.9%)-0.410.260.25 ASR (23.9%)-0.250.21 ASR (35.4%)-0.21 0.20 EPPS-EN Manual-0.340.36 ASR (10.6%)-0.300.270.26 ASR (14.0%)-0.200.25 ASR (24.1%)-0.190.230.24 EPPS-ES Manual-0.310.28 ASR (11.5%)-0.240.29 ASR (12.7%)-0.200.270.25 ASR (13.7%)-0.23 0.22 12

13 LREC 2010 Malta, May 20, 2010  ELDA 13 We presented evaluation campaigns of QA on speech data Evaluations were done for several languages and on different data (seminars, meetings, BN, parliament speeches) New methodology for semi-automatic evaluation of QA in ASR transcriptions QASTLE interface free for download Conclusion & perspectives (1/2)

14 LREC 2010 Malta, May 20, 2010  ELDA Future evaluation campaigns: –Multilingual / cross lingual QA –Oral questions with ASR transcription of the questions QAST 2007-2009 evaluation package soon available through the ELRA Catalog of language resources Conclusion & perspectives

15 LREC 2010 Malta, May 20, 2010  ELDA 15 Thank you for your attention... QAST : http://www.lsi.upc.edu/~qast/2009 QASTLE : http://elda.org/qastle/ ELRA Catalogue of Language Resources: http://catalog.elra.info/


Download ppt "LREC 2010 Malta, May 20, 2010  ELDA 1 Evaluation Protocol and Tools for Question-Answering on Speech Transcripts N. Moreau, O. Hamon, D. Mostefa ELDA/ELRA,"

Similar presentations


Ads by Google