Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,

Slides:



Advertisements
Similar presentations
SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.
Advertisements

Research & Development ICASSP' Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France.
Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner.
ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo,
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.
Linked Data as an enabler of cross-media and multilingual content analytics for enterprises across Europe A.Gómez-Pérez (UPM) Project Coordinator.
Vikas BhardwajColumbia University NLP for the Web – Spring 2010 Improving QA Accuracy by Question Inversion Prager et al. IBM T.J. Watson Res. Ctr. 02/18/2010.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
XML Document Mining Challenge Bridging the gap between Information Retrieval and Machine Learning Ludovic DENOYER – University of Paris 6.
3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008 UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks.
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
Alicante, September, 22, Workshop Overview of the Multilingual Question Answering Track Danilo Giampiccolo.
CLEF 2007 Multilingual Question Answering Track Danilo Giampiccolo, CELCT Anselmo Peñas, UNED.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
22 August 2003CLEF 2003 Answering Spanish Questions from English Documents Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob USC Information.
DFKI GmbH, , R. Karger Indo-German Workshop on Language Technologies Reinhard Karger, M.A. Deutsches Forschungszentrum für Künstliche Intelligenz.
Impressions of 10 years of CLEF Donna Harman Scientist Emeritus National Institute of Standards and Technology.
LREC 2008 From Research to Application in Multilingual Information Access: The Contribution of Evaluation Carol Peters ISTI-CNR, Pisa, Italy.
DELOS NoE DELOS Network of Excellence on Digital Libraries Vittore Casarosa CNR-IEI, Pisa, Italy.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Cross-Language Evaluation Forum CLEF Workshop 2004 Carol Peters ISTI-CNR, Pisa, Italy.
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Evaluating Multilingual Question Answering Systems at CLEF Pamela Forner 1, Danilo Giampiccolo 1, Bernardo Magnini 2, Anselmo Peñas 3, Álvaro Rodrigo 3,
A Language Independent Method for Question Classification COLING 2004.
CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1),
CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. R. Comas,TALP.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
How robust is CLIR? Proposal for a new robust task at CLEF Thomas Mandl Information Science Universität Hildesheim 6 th Workshop.
CLEF 2007 Workshop Budapest, Hungary, 19–21 September 2007 Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering.
Cross-Language Evaluation Forum CLEF 2003 Carol Peters ISTI-CNR, Pisa, Italy Martin Braschler Eurospider Information Technology AG.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Thomas Mandl: GeoCLEF Track Overview Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop.
QA Pilot Task at CLEF 2004 Jesús Herrera Anselmo Peñas Felisa Verdejo UNED NLP Group Cross-Language Evaluation Forum Bath, UK - September 2004.
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
Heiner Boeing Department of Epidemiology
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
LREC Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.
Towards Entailment Based Question Answering: ITC-irst at Clef 2006 Milen Kouylekov, Matteo Negri, Bernardo Magnini & Bonaventura Coppola ITC-irst, Centro.
 General domain question answering system.  The starting point was the architecture described in Brill, Eric. ‘Processing Natural Language without Natural.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
The CLEF 2005 interactive track (iCLEF) Julio Gonzalo 1, Paul Clough 2 and Alessandro Vallin Departamento de Lenguajes y Sistemas Informáticos, Universidad.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
1 INFILE - INformation FILtering Evaluation Evaluation of adaptive filtering systems for business intelligence and technology watch Towards real use conditions.
CLEF Workshop ECDL 2003 Trondheim Michael Kluck slide 1 Introduction to the Monolingual and Domain-Specific Tasks of the Cross-language.
From CLEF to TrebleCLEF Promoting Technology Transfer
IR Theory: Evaluation Methods
UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to…
Machine Reading.
CLEF 2008 Multilingual Question Answering Track
Presentation transcript:

Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004, Seoul

Question Answering task Give an answer to a question –Approach: Find (search) an answer in a document collection –A document must support the answer –Where is Seoul? South Korea (correct) Korea (responsive?) Asia (non responsive) Population of South Korea (inexact) Oranges of China (incorrect)

QA system architecture Question Answer Question Analysis Pre-processing / indexing Answer type / structure Key-terms Passage retrieval Answer extraction Answer validation / scoring Opportunity for natural language techniques Documents

Overview Evaluation forums: objectives QA evaluation methodology The challenge of multilingualism QA at CLEF 2003 QA at CLEF 2004 Conclusion

Evaluation Forums: Objectives Stimulate research Establish shared working lines Generate resources for evaluation and for training Compare different approaches and obtain some evidences Serve as a meeting point for collaboration and exchange (CLEF, TREC, NTCIR)

QA Evaluation Methodology Test suite production: Document collection (hundreds of thousands) Questions (hundreds) Systems answering (Answer + Document id) Limited time Judgment of answers Human assessors Correct, inexact, Unsupported, Incorrect Measuring of systems behavior % of questions correctly answered % of NIL questions correctly detected Precision Recall, F, MRR (Mean Reciprocal Rank), Confidence-weighted score,... Results comparison

QA Evaluation Methodology Considerations on task definition (I) Quantitative evaluation constrains the type of questions Questions valuable in terms of correctness, completeness and exactness e.g. “Which are the causes of the Iraq war?” Human resources available Test suite generation Assessment (# of questions, # of answers per question) Collection Restricted vs. unrestricted domains News vs. patents Multilingual QA: Comparable collections available

QA Evaluation Methodology Considerations on task definition (II) Research direction “Do it better” versus “How to get better results?” Systems are tuned according the evaluation task. e.g. evaluation measure, external resources (web) Roadmap versus state of the art What systems should do in future? (Burger, ) When is it realistic to incorporate new features in the evaluation? Type of questions, temporary restrictions, confidence in answer, encyclopedic knowledge and inference, different sources and languages, consistency between different answers,...

The challenge of multilingualism May I continue this talk in Spanish? Then multilingualism still remains a challenge...

The challenge of multilingualism Feasible with current QA state of the art Challenge for systems but challenge from the evaluation point of view What is the possible roadmap to achieve fully multilingual systems? –QA at CLEF (Cross-Language Evaluation Forum) –Monolingual  Bilingual  Multilingual systems What tasks can be proposed according the current state of the art? –Monolingual other than English? Bilingual considering English? –Any bilingual? Fully multilingual? Which new resources are needed for the evaluation? –Comparable corpus? Unrestricted domain? –Parallel corpus? Domain specific? Size? –Human resources: Answers in any language make difficult the assessment by native speakers

The challenge of multilingualism (cont.) How to ensure that fully multilingual systems receive better evaluation? –Some answers in just one language? How? »Hard pre-assessment? »Different languages for different domains? »Different languages for different dates or localities? »Parallel collections extracting a controlled subset of documents different for each language? –How to balance type and difficulty of questions in all languages? SpanishItalianDutchGermanFrench 250 Questions50 Answer only inSpanishItalianDutchGermanFrench 10 Ouch!

The challenge of multilingualism Fortunately (unfortunately), with the current state of the art is not realistic to plan such evaluation... Very few systems are able to deal with several target languages...yet While we try to answer the questions... Plan a separate evaluation for each target language seems more realistic Option followed by QA at CLEF in the short term

Overview Evaluation forums: objectives QA evaluation methodology The challenge of multilingualism QA at CLEF 2003 QA at CLEF 2004 Conclusion

QA at CLEF groups ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento, Italy UNED, Universidad Nacional de Educación a Distancia, Madrid, Spain ILLC, Language and Inference Technology Group, U. of Amsterdam DFKI, Deutsches Forschungszentrum für Künstliche Intelligenz, Saarbrücken, Germany ELDA/ELRA, Evaluations and Language Resources Distribution Agency, Paris, France Linguateca, Oslo (Norway), Braga, Lisbon & Porto (Portugal) BulTreeBank Project, CLPP, Bulgarian Academy of Sciences, Sofia, Bulgaria University of Limerick, Ireland ISTI-CNR, Istituto di Scienza e Tecnologie dell'Informazione "A. Faedo“, Pisa, Italy NIST, National Institute of Standards and Technology, Gaithersburg, USA

QA at CLEF 2003 Task 200 factoid questions, up to 3 answers per question Exact answer / answer in 50-byte long string Document collection [Spanish] >200,000 news (EFE, 1994) Questions DISEQuA corpus (available in web) (Magnini et al. 2003): –Coordinated work between ITC-IRST (Italian), UNED (Spanish) and U.Amsterdam (Dutch) –450 questions and answers translated into English, Spanish, Italian and Dutch 200 questions from DISEQuA corpus (20 NIL) Assessment Incorrect, Unsupported, Non-exact, Correct

Multilingual pool of questions Coordination between several groups Spanish (100) Italian (100) Dutch (100) German (100) French (100) Questions with known answer in each target language English pool (500) Translation into English Multilingual Pool (500x6) Spanish Italian Dutch German French English Translation into the rest of languages Final questions are selected from pool -For each target language -After pre-assessment

QA at CLEF 2003

QA at CLEF 2004: tasks Source languages (questions) Target languages (answers & docs.) Six main tasks (one per target language) (e.g. Spanish) English Spanish EFE , 1086 Mb (453,045 docs) French German Italian PortugueseDutch Portuguese? Bulgarian … KOREAN?

QA at CLEF questions –Factual: person, object, measure, organization... –Definition: person, organization –How-to 1 answer per question (without manual intervention) Up to two runs Exact answers Assessment: correct, inexact, unsupported, incorrect Evaluation: –Fraction of correct answers –Measures based on systems self-scoring

QA at CLEF 2004 Schedule Registration Opens Corpora Release Trial Data Test Sets Release Submission of Runs Release of Results Papers CLEF Workshop January 15 February March May 10 May 17 from July 15 August September

Conclusion Information and resources Cross-Language Evaluation Forum DISEQuA Corpus: Dutch, Italian, Spanish, English Spanish QA at CLEF