Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alicante, September, 22, 20006 2006 Workshop Overview of the Multilingual Question Answering Track Danilo Giampiccolo.

Similar presentations


Presentation on theme: "Alicante, September, 22, 20006 2006 Workshop Overview of the Multilingual Question Answering Track Danilo Giampiccolo."— Presentation transcript:

1 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Overview of the Multilingual Question Answering Track Danilo Giampiccolo

2 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Outline  Tasks  Test set preparation  Participants  Evaluation  Results  Final considerations  Future perspectives

3 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop QA 2006: Organizing Committee  ITC-irst (Bernardo Magnini): main coordinator  CELCT (D. Giampiccolo, P. Forner): general coordination, Italian  DFKI (B. Sacalenau): German  ELDA/ELRA (C. Ayache): French  Linguateca (P. Rocha): Portuguese  UNED (A. Penas): Spanish  U. Amsterdam (Valentin Jijkoun): Dutch  U. Limerick (R. Sutcliff): English  Bulgarian Academy of Sciences (P. Osenova): Bulgarian ♦Only Source Languages: ♦Depok University of Indonesia (M. Adriani): Indonesian ♦IASI, Romania (D. Cristea): Romanian ♦Wrocław University of Technology (J. Pietraszko): Polish

4 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop QA@CLEF-06: Tasks  Main task: ♦Monolingual: the language of the question (Source language) and the language of the news collection (Target language) are the same ♦Cross-lingual: the questions were formulated in a language different from that of the news collection  One pilot task: ♦WiQA: coordinated by Maarten de Rijke  Two exercises:  Answer Validation Exercise (AVE): coordinated by Anselmo Peñas  Real Time: a “time-constrained” QA exercise coordinated by the University of Alicante (coordinated by Fernando Llopis)

5 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Data set: Question format 200 Questions of three kinds  FACTOID ( loc, mea, org, oth, per, tim; ca. 150):  What party did Hitler belong to?  DEFINITION (ca. 40): Who is Josef Paul Kleihues? ♦reduced in number (-25%) ♦ two new categories added: –Object: What is a router? –Other: What is a tsunami?  LIST (ca. 10): Name works by Tolstoy ♦Temporally restricted (ca. 40): by date, by period, by event ♦NIL (ca. 20): questions that do not have any known answer in the target document collection  input format: question type (F, D, L) not indicated NEW!

6 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop  Multiple answers: from one to ten exact answers per question ♦exact = neither more nor less than the information required ♦each answer has to be supported by – docid – one to ten text snippets justifying the answer (substrings of the specified document giving the actual context) NEW! Data set: run format NEW!

7 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Activated Tasks (at least one registered participant) S T BGDEENESFRINITNLPTPLRO BG DE EN ES FR IT NL PT  11 Source languages (10 in 2005)  8 Target languages (9 in 2005)  No Finnish task / New languages: Polish and Romanian

8 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Activated Tasks MONOLINGUALCROSS-LINGUALTOTAL CLEF 2003358 CLEF 200461319 CLEF 200581523 CLEF 200671724  questions were not translated in all the languages  Gold Standard: questions in multiple languages only for tasks were there was at least one registered participant NEW! More interest in cross-linguality

9 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Participants AmericaEuropeAsiaTOTAL Registered participants New comersVeterans Absent veterans CLEF 2003 35-8 CLEF 2004 117- 18 (+125%) 22 1353 CLEF 2005 1221 24 (+33%) 27 9154 CLEF 2006 4242 30 (+25%) 36 10204

10 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop List of participants ACRONYMNAMECOUNTRY SYNAPSESYNAPSE DeveloppementFrance Ling-CompU.Rome-La SapienzaItaly AlicanteU.Alicante- InformaticaSpain HagenU.Hagen-InformaticsGermany DaedalusDaedalus ConsortiumSpain JaenU.Jaen-Intell.SystemsSpain ISLAU.AmsterdamNetherlands INAOEInst.Astrophysics,Optics&ElectronicsMexico DEPOKU.Indonesia-Comp.Sci.Indonesia DFKIDFKI-Lang.Tech.Germany FURUI Lab.Tokyo Inst TechnologyJapan LinguatecaLinguateca-SintefNorway LIC2M-CEACentre CEA SaclayFrance LINAU.Nantes-LINAFrance PriberamPriberam InformaticaPortugal U.PortoU.Porto- AIPortugal U.GroningenU.Groningen-LettersNetherlands ACRONYMNAMECOUNTRY Lab.Inf.D‘ Avignon Lab.Inf. D'AvignonFrance U.Sao PauloU.Sao Paulo – MathBrazil VanguardVanguard EngineeringMexico LCCLanguage Comp. Corp.USA UAICU.AI.I Cuza" IasiRomania Wroclaw U.Wroclaw U.of TechPoland RFIA-UPVUniv.Politècnica de ValenciaSpain LIMSICNRS Lab-Orsay CedexFrance U.StuttgartU.Stuttgart-NLPGermany ITCITC-irst,Italy JRC- ISPRA Institute for the Protection and the Security of the Citizen Italy BTBBulTreeBank ProjectSofia dltgUniversity of LimerickIreland Industrial Companies

11 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Submitted runs # Monolingual # Cross-lingual # CLEF 200317 611 CLEF 200448 (+182%) 2028 CLEF 200567 (+39.5%) 4324 CLEF 200677 (+13%) 4235

12 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Number of answers and snippets per question Number of RUNS with respect to number of answers 1 answer more than 5 answers between 2 and 5 answers Number of SNIPPETS for each answer 1 snippet 2 snippets 3 snippets > 4 snippets

13 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Evaluation  As in previous campaigns ♦runs manually judged by native speakers ♦each answer: Right, Wrong, ineXact, Unsupported ♦up to two runs for each participating group  Evaluation measures ♦Accuracy (for F,D); main evaluation score, calculated for the FIRST ANSWER only excessive workload: some groups could manually assess only one answer (the first one) per question –1 answer: Spanish and English –3 answers: French –5 answers: Dutch –all answers: Italian, German, Portoguese ♦P@N for List questions Additional evaluation measures ♦K1 measure ♦Confident Weighted Score (CWS) ♦Mean Reciprocal Rank (MRR) NEW!

14 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Question Overlapping among Languages 2005-2006

15 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Results: Best and Average scores 49,47 * This result is still under validation. *

16 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Best results in 2004-2005-2006 22,63 * This result is still under validation. *

17 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Participants in 2004-2005-2006: compared best results

18 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop List questions  Best: 0.8333 (Priberam, Monolingual PT)  Average: 0.138 Problems  Wrong classification of List Questions in the Gold Standard ♦Mention a Chinese writer is not a List question!  Definition of List Questions ♦“closed” List questions asking for a finite number of answers Q: What are the names of the two lovers from Verona separated by family issues in one of Shakespeare’s plays? A: Romeo and Juliet. ♦“open” List questions requiring a list of items as answer Q: Name books by Jules Verne. A: Around the World in 80 Days. A: Twenty Thousand Leagues Under The Sea. A: Journey to the Centre of the Earth.

19 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Final considerations –Increasing interest in multilingual QA More participants (30, + 25%) Two new languages as source (Romanian and Polish) More activated tasks (24, they were 23 in 2005) More submitted runs (77, +13%) More cross-lingual tasks (35, +31.5%) –Gold Standard: questions not translated in all languages No possibility of activating tasks at the last minutes Useful as reusuable resource: available in the near future.

20 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Final considerations: 2006 main task innovations –Multiple answers: good response limited capacity of assessing large numbers of answers. feedback welcome from participants –Supporting snippets: faster evaluation feedback from participants –“F/D/L/” labels not given in the input format: positive, as apparently there was no real impact on –List questions

21 Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Future perspective: main task  For discussion:  Romanian as target  Very hard questions (implying reasoning and multiple document answers)  Allow collaboration among different systems  Partial automated evaluation (right answers)


Download ppt "Alicante, September, 22, 20006 2006 Workshop Overview of the Multilingual Question Answering Track Danilo Giampiccolo."

Similar presentations


Ads by Google