Presentation is loading. Please wait.

Presentation is loading. Please wait.

CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 (http://clef-qa.itc.it) The Multiple Language Question Answering Track at CLEF 2003.

Similar presentations


Presentation on theme: "CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 (http://clef-qa.itc.it) The Multiple Language Question Answering Track at CLEF 2003."— Presentation transcript:

1 CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 (http://clef-qa.itc.it) The Multiple Language Question Answering Track at CLEF 2003 Bernardo Magnini*, Simone Romagnoli*, Alessandro Vallin* Jesús Herrera**, Anselmo Peñas**, Víctor Peinado**, Felisa Verdejo** Maarten de Rijke*** * ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento - Italy {magnini,romagnoli,vallin}@itc.it ** UNED, Spanish Distance Learning University, Madrid – Spain {jesus.herrera,anselmo,victor,felisa}@lsi.uned.es *** Language and Inference Technology Group, ILLC, University of Amsterdam - The Netherlands mdr@science.uva.nl

2 Outline Overview of the Question Answering track at CLEF 2003 Report on the organization of QA tasks Present and discuss the participants’ results Perspectives for future QA campaigns

3 Question Answering QA: find the answer to an open domain question in a large collection of documents INPUT: questions (instead of keyword-based queries) OUTPUT: answers (instead of documents) QA track at TREC –Mostly fact-based questions Question: Who invented the electric light? Answer: Edison Scientific Community –NLP and IR –AQUAINT program in USA QA as an applicative scenario

4 Multilingual QA Purposes: Answers may be found in languages different from the language of the question Interest in QA systems for languages other than English Force the QA community to design real multilingual systems Check/improve the portability of the technologies implemented in current English QA systems Creation of reusable resources and benchmarks for further multilingual QA evaluation

5 QA at CLEF 2003 - Organization  “QA@CLEF” WEB SITE ( http://clef-qa.itc.it )http://clef-qa.itc.it  CLEF QA MAILING LIST ( clef-qa@itc.it )clef-qa@itc.it  GUIDELINES FOR THE TRACK (following the model of TREC 2001)

6 Tasks at CLEF 2003 200 question s target corpus exact answers 50 bytes answers

7 QA Tasks at CLEF 2003 Monolingual Bilingual  English Q-setAssessmentQ-setAssessment Italian ITC-irst NIST Dutch U. Amsterdam ITC-irst U. Amsterdam NIST Spanish UNED ITC-irst UNED NIST French ITC-irst U. Montreal NIST German ITC-irst DFKI NIST

8 Tasks at CLEF 2003 Monolingual Bilingual  English Q-setAssessmentQ-setAssessment Italian ITC-irst NIST Dutch U. Amsterdam ITC-irst U. Amsterdam NIST Spanish UNED ITC-irst UNED NIST French ITC-irst U. Montreal NIST German ITC-irst DFKI NIST 1 1 11 0 1 3 1

9 Bilingual against English English questions Question extraction Italian questions Translation English answers QA system Assessment English text collection 1 p/m for 200 questions2 p/d for 200 questions 4 p/d for 1 run (600 answers)

10 Document Collections Corpora licensed by CLEF in 2002: Dutch Algemeen Dagblad and NRC Handelsblad (years 1994 and 1995) Italian La Stampa and SDA press agency (1994) Spanish EFE press agency (1994) English Los Angeles Times (1994) MONOLINGUAL TASKS BILINGUAL TASK

11 Creating the Test Collection CLEF Topics 150 q/a Dutch 150 q/a Italian 150 q/a Spanish MONOLINGUAL TEST SETS 150 Dutch/English 150 Italian/English 150 Spanish/English ENGLISH QUESTIONS SHARING ILLCITC-irstUNED 300 Ita+Spa 300 Dut+Spa 300 Ita+Dut NEW TARGET LANGUAGES ENGLISH the DISEQuA corpus DATA MERGING

12 Questions 200 fact-based questions for each task: - queries related to the events occurred in the years 1994 and/or 1995, i.e. the years of the target corpora; - coverage of different categories of questions: date, location, measure, person, object, organization, other; - questions were not guaranteed to have an answer in the corpora: 10% of the test sets required the answer string “NIL”

13 Questions 200 fact-based questions for each task: - queries related to the events occurred in the years 1994 and/or 1995, i.e. the years of the target corpora - coverage of different categories of questions (date, location, measure, person, object, organization, other) - questions were not guaranteed to have an answer in the corpora: 10% of the test sets required the answer string “NIL” - definition questions (“Who/What is X”) - Yes/No questions - list questions

14 Answers Participants were allowed to submit up to three answers per question and up to two runs: - answers must be either exact (i.e. contain just the minimal information) or 50 bytes long strings - answers must be supported by a document - answers must be ranked by confidence Answers were judged by human assessors, according to four categories: CORRECT (R) UNSUPPORTED (U) INEXACT (X) INCORRECT (W)

15 Judging the Answers Question and judged responsesComment What museum is directed by Henry Hopkins? W 1 irstex031bi 1 3253 LA011694-0094 Modern Art U 1 irstex031bi 2 1776 LA011694-0094 UCLA X 1 irstex031bi 3 1251 LA042294-0050 Cultural Center The second answer was correct but the document retrieved was not relevant. The third response missed bits of the name, and was judged non-exact. Where did the Purussaurus live before becoming extinct? W 2 irstex031bi 1 9 NIL The system erroneously “believed” that the query had no answer in the corpus, or could not find one. When did Shapour Bakhtiar die? R 3 irstex031bi 1 484 LA012594-0239 1991 W 3 irstex031bi 2 106 LA012594-0239 Monday In the questions that asked for the date of an event, the year was often regarded as sufficient. Who is John J. Famalaro accused of having killed? W 4 irstex031bi 1 154 LA072294-0071 Clark R 4 irstex031bi 2 117 LA072594-0055 Huber W 4 irstex031bi 3 110 LA072594-0055 Department The second answer, that returned the victim’s last name, was considered sufficient and correct, since in the document retrieved no other people named “Huber” were mentioned.

16 Evaluation Measures The score for each question was the reciprocal of the rank of the first answer to be found correct; if no correct answer was returned, the score was 0. The total score, or Mean Reciprocal Rank (MRR), was the mean score over all questions. In STRICT evaluation only correct (R) answers scored points. In LENIENT evaluation the unsupported (U) answers were considered correct, as well.

17 Participants GROUPTASKRUN NAME DLSI-UA University of Alicante, Spain Monolingual Spanish alicex031ms alicex032ms UVA University of Amsterdam, The Netherlands Monolingual Dutch uamsex031md uamsex032md ITC-irst Italy Monolingual Italian Bilingual Italian irstex031mi irstst032mi irstex031bi irstex032bi ISI University of Southern California, USA Bilingual Spanish isixex031bs isixex032bs /Bilingual Dutch / DFKI Germany Bilingual German dfkist031bg CS-CMU Carnegie Mellon University, USA Bilingual French lumoex031bf lumoex032bf DLTG University of Limerick, Ireland Bilingual French dltgex031bf dltgex032bf RALI University of Montreal, Canada Bilingual French udemst031bf udemex032bf

18 Participants in past QA tracks Comparison between the number and place of origin of the participants in the past TREC and in this year’s CLEF QA tracks: PARTICIPANTS No. of submitted runs United States Canada EuropeAsiaAustraliaTOTAL TREC-8 133312046 TREC-9 1476/2775 TREC-10 1988/3567 TREC-11 16106/3267 CLEF 2003 35//817

19 Performances at TREC-QA Evaluation metric: Mean Reciprocal Rank (MRR ) 1 rank of the correct answer Best result Average over 67 runs  / 500 TREC-8 TREC-9TREC-10 66% 25% 58% 24% 67% 23%

20 Results - EXACT ANSWERS RUNS MONOLINGUAL TASKS GROUP TASKRUN NAMEMRR No. of Q. with at least one right answer NIL Questions strictlenientstrictlenientreturnedcorrectly returned DLSI-UA Monolingual Spanish alicex031ms.307.3208087215 alicex032ms.296.3177077215 ITC-irst Monolingual Italian irstex031mi.422.4429710142 UVA Monolingual Dutch uamsex031md.298.317788220017 uamsex032md.305.335828920017

21 Results - EXACT ANSWERS RUNS MONOLINGUAL TASKS

22 Results - EXACT ANSWERS RUNS CROSS-LANGUAGE TASKS GROUPTASKRUN NAMEMRR No. of Q. with at least one right answer NIL Questions strictlenientstrictlenientreturnedcorrectly returned ISI Bilingual Spanish isixex031bs.302.328697740 isixex032bs.271.307687840 ITC-irst Bilingual Italian irstex031bi.322.3347781496 irstex032bi.393.4009092285 CS-CMU Bilingual French lumoex031bf.153.1703842928 lumoex032bf.131.1493135917 DLTG Bilingual French dltgex031bf.115.120232411910 dltgex032bf.110.115222311910 RALI Bilingual French udemex032bf.140.160384231

23 Results - EXACT ANSWERS RUNS CROSS-LANGUAGE TASKS

24 Results - 50 BYTES ANSWERS RUNS MONOLINGUAL TASKS GROUPTASKRUN NAME MRR No. of Q. with at least one right answer NIL Questions strictlenientstrictlenientreturnedcorrectly returned ITC-irst Monolingual Italian irstst032mi.449.4719910452

25 Results - 50 BYTES ANSWERS RUNS CROSS-LANGUAGE TASKS GROUPTASKRUN NAMEMRR No. of Q. with at least one right answer NIL Questions strictlenientstrictlenientreturnedcorrectly returned DFKI Bilingual German dfkist031bg.098.1032930180 RALI Bilingual French udemst031bf.213.220565841

26 Average Results in Different Tasks

27 Approaches in CL QA Two main different approaches used in Cross-Language QA systems: answer extraction question processing answer extraction question processing in the source language to retrieve information (such as keywords, question focus, expected answer type, etc.) translation and expansion of the retrieved data 1 2 translation of the question into the target language (i.e. in the language of the document collection)

28 Approaches in CL QA Two main different approaches used in Cross-Language QA systems: answer extraction question processing answer extraction preliminary question processing in the source language to retrieve information (such as keywords, question focus, expected answer type, etc.) translation and expansion of the retrieved data 1 2 translation of the question into the target language (i.e. in the language of the document collection) ITC-irst RALI DFKI ISI CS-CMU Limerik

29 Conclusions  A pilot evaluation campaign for multiple language Question Answering Systems has been carried on.  Five European languages were considered: three monolingual tasks and five bilingual tasks against an English collection have been activated.  Considering the difference of the task, results are comparable with QA at TREC.  A corpus of 450 questions, each in four languages, reporting at least one known answer in the respective text collection, has been built.  This year experience was very positive: we intend to continue with QA at CLEF 2004.

30 Perspective for Future QA Campaigns Organization issues: Promote larger participation Collaboration with NIST Financial issues: Find a sponsor: ELRA, the new CELCT center, … Tasks (to be discussed) Update to TREC-2003: definition questions, list questions Consider just “exact answer”: 50 bytes did not have much favor Introduce new languages: in the cross-language task this is easy to do New steps toward multilinguality: English questions against other language collections; a small set of full cross-language tasks (e.g. Italian/Spanish).

31 Creation of the Question Set 1. Find 200 questions for each language (Dutch, Italian, Spanish), based on CLEF-2002 topics, with at least one answer in the respective corpus. 2. Translate each question into English, and from English into the other two languages. 3. Find answers in the corpora of the other languages (e.g. a Dutch question was translated and processed in the Italian text collection). 4.The result is a corpus of 450 questions, each in four languages, with at least one known answer in the respective text collection. More details in the paper and in the Poster. 5. Questions with at least one answer in all the corpora were selected for the final question set.


Download ppt "CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 (http://clef-qa.itc.it) The Multiple Language Question Answering Track at CLEF 2003."

Similar presentations


Ads by Google