Alicante, September, 22, 20006 2006 Workshop Overview of the Multilingual Question Answering Track Danilo Giampiccolo.

Slides:

Advertisements

Similar presentations

Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

Advertisements

CLEF QA, September 21, 2006, Synapse Développement, D. LAURENT Why not 100% ?

TRANSNATIONAL REPORT on the National Workshops organized in the framework of the NELLIP project.

SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.

Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner.

Vienna, September 22, 2005 CLEF 2005 Workshop Overview of the Multilingual Question Answering Track Alessandro Vallin, Danilo Giampiccolo, Bernardo Magnini.

ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo,

Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim

1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

1 CLEF 2011, Amsterdam QA4MRE, Question Answering for Machine Reading Evaluation Question Answering Track Overview Main Task Anselmo Peñas Eduard Hovy.

Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.

CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.

3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008 UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks.

CLEF 2007 Multilingual Question Answering Track Danilo Giampiccolo, CELCT Anselmo Peñas, UNED.

Answer Validation Exercise Anselmo Peñas UNED NLP Group 2005 Breakout session.

 Ad-hoc - This track tests mono- and cross- language text retrieval. Tasks in 2009 will test both CL and IR aspects.

 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.

Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.

August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.

CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.

CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.

CCLVET Cross Cultural Learning and Teaching in Vocational Education and Training Overview LEONARDO DA VINCI Transfer of Innovation AGREEMENT NUMBER – LLP-LDV-TOI-08-AT-0021.

Impressions of 10 years of CLEF Donna Harman Scientist Emeritus National Institute of Standards and Technology.

IATE EU tool for translation-oriented terminology work

CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.

JRC-Ispra, , Slide 1 Next Steps / Technical Details Bruno Pouliquen & Ralf Steinberger Addressing the Language Barrier Problem in the Enlarged.

1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.

Cross-Language Evaluation Forum CLEF Workshop 2004 Carol Peters ISTI-CNR, Pisa, Italy.

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.

Evaluating Question Answering Validation Anselmo Peñas (and Alvaro Rodrigo) NLP & IR group UNED nlp.uned.es Information Science Institute Marina del Rey,

1 JPI Water Partnership Towards Joint Programming in Research Water Challenges for a Changing World.

The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield

Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.

MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.

Evaluating Multilingual Question Answering Systems at CLEF Pamela Forner 1, Danilo Giampiccolo 1, Bernardo Magnini 2, Anselmo Peñas 3, Álvaro Rodrigo 3,

 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge.

Saskia Sluiter and Erna Gille (CITO, The Netherlands) 3 June 2005 EALTA conference Voss EBAFLS : Building a European Bank of Anchor items for Foreign Language.

CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1),

LREC 2010 Malta, May 20, 2010  ELDA 1 Evaluation Protocol and Tools for Question-Answering on Speech Transcripts N. Moreau, O. Hamon, D. Mostefa ELDA/ELRA,

CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. R. Comas,TALP.

1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.

CLEF 2007 Workshop Budapest, Hungary, 19–21 September 2007 Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering.

CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.

Cross-Language Evaluation Forum CLEF 2003 Carol Peters ISTI-CNR, Pisa, Italy Martin Braschler Eurospider Information Technology AG.

Thomas Mandl: GeoCLEF Track Overview Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop.

QA Pilot Task at CLEF 2004 Jesús Herrera Anselmo Peñas Felisa Verdejo UNED NLP Group Cross-Language Evaluation Forum Bath, UK - September 2004.

Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.

1 European Association for Language Testing and Assessment

LREC Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.

1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.

AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.

CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.

Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim

The CLEF 2005 interactive track (iCLEF) Julio Gonzalo 1, Paul Clough 2 and Alessandro Vallin Departamento de Lenguajes y Sistemas Informáticos, Universidad.

Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,

1 Predicting Answer Location Using Shallow Semantic Analogical Reasoning in a Factoid Question Answering System Hapnes Toba, Mirna Adriani, and Ruli Manurung.

CLEF Workshop ECDL 2003 Trondheim Michael Kluck slide 1 Introduction to the Monolingual and Domain-Specific Tasks of the Cross-language.

Most Professional Translation Services provider in USA.

CLEF Budapest1 Measuring the contribution of Word Sense Disambiguation for QA Proposers: UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini.

EU and multilingualism

The InWEnt Blended-learning approach; GC21 as an e-learning and Blended-learning platform 22/02/2019 An introduction course on InWEnt Blended-learning.

Large scale multilingual and multimodal integration

What is the Entrance Exams Task

UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to…

Legal and implementation issues update

Machine Reading.

CLEF 2008 Multilingual Question Answering Track

Presentation transcript:

Alicante, September, 22, Workshop Overview of the Multilingual Question Answering Track Danilo Giampiccolo

Alicante, September, 22, Workshop Outline  Tasks  Test set preparation  Participants  Evaluation  Results  Final considerations  Future perspectives

Alicante, September, 22, Workshop QA 2006: Organizing Committee  ITC-irst (Bernardo Magnini): main coordinator  CELCT (D. Giampiccolo, P. Forner): general coordination, Italian  DFKI (B. Sacalenau): German  ELDA/ELRA (C. Ayache): French  Linguateca (P. Rocha): Portuguese  UNED (A. Penas): Spanish  U. Amsterdam (Valentin Jijkoun): Dutch  U. Limerick (R. Sutcliff): English  Bulgarian Academy of Sciences (P. Osenova): Bulgarian ♦Only Source Languages: ♦Depok University of Indonesia (M. Adriani): Indonesian ♦IASI, Romania (D. Cristea): Romanian ♦Wrocław University of Technology (J. Pietraszko): Polish

Alicante, September, 22, Workshop Tasks  Main task: ♦Monolingual: the language of the question (Source language) and the language of the news collection (Target language) are the same ♦Cross-lingual: the questions were formulated in a language different from that of the news collection  One pilot task: ♦WiQA: coordinated by Maarten de Rijke  Two exercises:  Answer Validation Exercise (AVE): coordinated by Anselmo Peñas  Real Time: a “time-constrained” QA exercise coordinated by the University of Alicante (coordinated by Fernando Llopis)

Alicante, September, 22, Workshop Data set: Question format 200 Questions of three kinds  FACTOID ( loc, mea, org, oth, per, tim; ca. 150):  What party did Hitler belong to?  DEFINITION (ca. 40): Who is Josef Paul Kleihues? ♦reduced in number (-25%) ♦ two new categories added: –Object: What is a router? –Other: What is a tsunami?  LIST (ca. 10): Name works by Tolstoy ♦Temporally restricted (ca. 40): by date, by period, by event ♦NIL (ca. 20): questions that do not have any known answer in the target document collection  input format: question type (F, D, L) not indicated NEW!

Alicante, September, 22, Workshop  Multiple answers: from one to ten exact answers per question ♦exact = neither more nor less than the information required ♦each answer has to be supported by – docid – one to ten text snippets justifying the answer (substrings of the specified document giving the actual context) NEW! Data set: run format NEW!

Alicante, September, 22, Workshop Activated Tasks (at least one registered participant) S T BGDEENESFRINITNLPTPLRO BG DE EN ES FR IT NL PT  11 Source languages (10 in 2005)  8 Target languages (9 in 2005)  No Finnish task / New languages: Polish and Romanian

Alicante, September, 22, Workshop Activated Tasks MONOLINGUALCROSS-LINGUALTOTAL CLEF CLEF CLEF CLEF  questions were not translated in all the languages  Gold Standard: questions in multiple languages only for tasks were there was at least one registered participant NEW! More interest in cross-linguality

Alicante, September, 22, Workshop Participants AmericaEuropeAsiaTOTAL Registered participants New comersVeterans Absent veterans CLEF CLEF (+125%) CLEF (+33%) CLEF (+25%)

Alicante, September, 22, Workshop List of participants ACRONYMNAMECOUNTRY SYNAPSESYNAPSE DeveloppementFrance Ling-CompU.Rome-La SapienzaItaly AlicanteU.Alicante- InformaticaSpain HagenU.Hagen-InformaticsGermany DaedalusDaedalus ConsortiumSpain JaenU.Jaen-Intell.SystemsSpain ISLAU.AmsterdamNetherlands INAOEInst.Astrophysics,Optics&ElectronicsMexico DEPOKU.Indonesia-Comp.Sci.Indonesia DFKIDFKI-Lang.Tech.Germany FURUI Lab.Tokyo Inst TechnologyJapan LinguatecaLinguateca-SintefNorway LIC2M-CEACentre CEA SaclayFrance LINAU.Nantes-LINAFrance PriberamPriberam InformaticaPortugal U.PortoU.Porto- AIPortugal U.GroningenU.Groningen-LettersNetherlands ACRONYMNAMECOUNTRY Lab.Inf.D‘ Avignon Lab.Inf. D'AvignonFrance U.Sao PauloU.Sao Paulo – MathBrazil VanguardVanguard EngineeringMexico LCCLanguage Comp. Corp.USA UAICU.AI.I Cuza" IasiRomania Wroclaw U.Wroclaw U.of TechPoland RFIA-UPVUniv.Politècnica de ValenciaSpain LIMSICNRS Lab-Orsay CedexFrance U.StuttgartU.Stuttgart-NLPGermany ITCITC-irst,Italy JRC- ISPRA Institute for the Protection and the Security of the Citizen Italy BTBBulTreeBank ProjectSofia dltgUniversity of LimerickIreland Industrial Companies

Alicante, September, 22, Workshop Submitted runs # Monolingual # Cross-lingual # CLEF CLEF (+182%) 2028 CLEF (+39.5%) 4324 CLEF (+13%) 4235

Alicante, September, 22, Workshop Number of answers and snippets per question Number of RUNS with respect to number of answers 1 answer more than 5 answers between 2 and 5 answers Number of SNIPPETS for each answer 1 snippet 2 snippets 3 snippets > 4 snippets

Alicante, September, 22, Workshop Evaluation  As in previous campaigns ♦runs manually judged by native speakers ♦each answer: Right, Wrong, ineXact, Unsupported ♦up to two runs for each participating group  Evaluation measures ♦Accuracy (for F,D); main evaluation score, calculated for the FIRST ANSWER only excessive workload: some groups could manually assess only one answer (the first one) per question –1 answer: Spanish and English –3 answers: French –5 answers: Dutch –all answers: Italian, German, Portoguese for List questions Additional evaluation measures ♦K1 measure ♦Confident Weighted Score (CWS) ♦Mean Reciprocal Rank (MRR) NEW!

Alicante, September, 22, Workshop Question Overlapping among Languages

Alicante, September, 22, Workshop Results: Best and Average scores 49,47 * This result is still under validation. *

Alicante, September, 22, Workshop Best results in ,63 * This result is still under validation. *

Alicante, September, 22, Workshop Participants in : compared best results

Alicante, September, 22, Workshop List questions  Best: (Priberam, Monolingual PT)  Average: Problems  Wrong classification of List Questions in the Gold Standard ♦Mention a Chinese writer is not a List question!  Definition of List Questions ♦“closed” List questions asking for a finite number of answers Q: What are the names of the two lovers from Verona separated by family issues in one of Shakespeare’s plays? A: Romeo and Juliet. ♦“open” List questions requiring a list of items as answer Q: Name books by Jules Verne. A: Around the World in 80 Days. A: Twenty Thousand Leagues Under The Sea. A: Journey to the Centre of the Earth.

Alicante, September, 22, Workshop Final considerations –Increasing interest in multilingual QA More participants (30, + 25%) Two new languages as source (Romanian and Polish) More activated tasks (24, they were 23 in 2005) More submitted runs (77, +13%) More cross-lingual tasks (35, +31.5%) –Gold Standard: questions not translated in all languages No possibility of activating tasks at the last minutes Useful as reusuable resource: available in the near future.

Alicante, September, 22, Workshop Final considerations: 2006 main task innovations –Multiple answers: good response limited capacity of assessing large numbers of answers. feedback welcome from participants –Supporting snippets: faster evaluation feedback from participants –“F/D/L/” labels not given in the input format: positive, as apparently there was no real impact on –List questions

Alicante, September, 22, Workshop Future perspective: main task  For discussion:  Romanian as target  Very hard questions (implying reasoning and multiple document answers)  Allow collaboration among different systems  Partial automated evaluation (right answers)