Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.

Slides:

Advertisements

Similar presentations

SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.

Advertisements

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner.

Recognizing Textual Entailment Challenge PASCAL Suleiman BaniHani.

Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.

UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-

1 CLEF 2011, Amsterdam QA4MRE, Question Answering for Machine Reading Evaluation Question Answering Track Overview Main Task Anselmo Peñas Eduard Hovy.

1. LEARN CONNECT & GET SUPPORT GENERATE LEADS START Marketing Community Marketing Services Bureau Microsoft Dynamics Marketplace Demand Generation Campaigns.

Robust Textual Inference via Graph Matching Aria Haghighi Andrew Ng Christopher Manning.

CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.

Supporting e-learning with automatic glossary extraction Experiments with Portuguese Rosa Del Gaudio, António Branco RANLP, Borovets 2007.

Keyword extraction for metadata annotation of Learning Objects Lothar Lemnitzer, Paola Monachesi RANLP, Borovets 2007.

3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008 UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks.

Center for Computational Learning Systems Independent research center within the Engineering School NLP people at CCLS: Mona Diab, Nizar Habash, Martin.

Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer.

UNED at PASCAL RTE-2 Challenge IR&NLP Group at UNED nlp.uned.es Jesús Herrera Anselmo Peñas Álvaro Rodrigo Felisa Verdejo.

CLEF 2007 Multilingual Question Answering Track Danilo Giampiccolo, CELCT Anselmo Peñas, UNED.

Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University

ResPubliQA IR baselines and UNED participation Álvaro Rodrigo Joaquín Pérez Anselmo Peñas Guillermo Garrido Lourdes Araujo nlp.uned.es.

Answer Validation Exercise Anselmo Peñas UNED NLP Group 2005 Breakout session.

Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,

AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

Overview of the Fourth Recognising Textual Entailment Challenge NIST-Nov. 17, 2008TAC Danilo Giampiccolo (coordinator, CELCT) Hoa Trang Dan (NIST)

CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.

CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.

1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.

“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.

CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.

Syntactic and semantic models and algorithms in Question Answering Alexander Solovyev Bauman Moscow Sate Technical University RCDL.

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.

RTE Planning Session Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo.

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.

L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.

Evaluating Question Answering Validation Anselmo Peñas (and Alvaro Rodrigo) NLP & IR group UNED nlp.uned.es Information Science Institute Marina del Rey,

MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.

A Language Independent Method for Question Classification COLING 2004.

21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.

Recognizing textual entailment: Rational, evaluation and approaches Source:Natural Language Engineering 15 (4) Author:Ido Dagan, Bill Dolan, Bernardo Magnini.

Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

Introduction to Dialogue Systems. User Input System Output ?

Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.

Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.

FlashNormalize: Programming by Examples for Text Normalization International Joint Conference on Artificial Intelligence, Buenos Aires 7/29/2015FlashNormalize1.

QA Pilot Task at CLEF 2004 Jesús Herrera Anselmo Peñas Felisa Verdejo UNED NLP Group Cross-Language Evaluation Forum Bath, UK - September 2004.

Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

Toward an Open Source Textual Entailment Platform (Excitement Project) Bernardo Magnini (on behalf of the Excitement consortium) 1 STS workshop, NYC March.

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

Towards Entailment Based Question Answering: ITC-irst at Clef 2006 Milen Kouylekov, Matteo Negri, Bernardo Magnini & Bonaventura Coppola ITC-irst, Centro.

Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III.

1 CPA: Where do we go from here? Research Institute for Information and Language Processing, University of Wolverhampton; UPF Barcelona; University of.

Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.

Recognising Textual Entailment Johan Bos School of Informatics University of Edinburgh Scotland,UK.

A Trainable Multi-factored QA System Radu Ion, Dan Ştefănescu, Alexandru Ceauşu, Dan Tufiş, Elena Irimia, Verginica Barbu-Mititelu Research Institute for.

CLEF Budapest1 Measuring the contribution of Word Sense Disambiguation for QA Proposers: UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini.

INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.

Social Knowledge Mining

Recognizing Partial Textual Entailment

Automatic Detection of Causal Relations for Question Answering

What is the Entrance Exams Task

UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to…

Extracting Why Text Segment from Web Based on Grammar-gram

Machine Reading.

CLEF 2008 Multilingual Question Answering Track

Presentation transcript:

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo Thanks to… Main task organizing committee

nlp.uned.es/QA/ave What? Answer Validation Exercise Validate the correctness of the answers…... given by the participants at CLEF QA 2007

nlp.uned.es/QA/ave AVE 2006: an RTE exercise If the text semantically entails the hypothesis, then the answer is expected to be correct. Question Supporting snippet & doc ID Exact Answer QA system Hypothesis Into affirmative form Text

nlp.uned.es/QA/ave Answer Validation Exercise Question Answering Question Candidate answer Supporting Text Textual Entailment Answer is not correct or not enough evidence Automatic Hypothesis Generation Question Hypothesis Answer is correct AVE 2006 AVE 2007 Answer Validation Black box

nlp.uned.es/QA/ave Answer Validation Exercise  AVE 2006  Not possible to quantify the potential gain that AV modules give to QA systems  Change in AVE 2007 methodology Group answers by question Systems must validate all But select one

nlp.uned.es/QA/ave AVE 2007 Collections What is Zanussi? was an Italian producer of home appliances Zanussi For the Polish film director, see Krzysztof Zanussi. For the hot-air balloon, see Zanussi (balloon). Zanussi was an Italian producer of home appliances that in 1984 was bought who had also been in Cassibile since August 31 Only after the signing had taken place was Giuseppe Castellano informed of the additional clauses that had been presented by general Ronald Campbell to another Italian general, Zanussi, who had also been in Cassibile since August (1985) 3 Out of 5 Live (1985) What Is This?

nlp.uned.es/QA/ave Collections  Remove duplicated answers inside the same question group  Discard NIL answers, void answers and answers with too long supporting snippet  This processing lead to a reduction in the number of answers to be validated

nlp.uned.es/QA/ave Collections (# answers to validate) Available for CLEF participants atnlp.uned.es/QA/ave/ TestingDevelopment English Spanish German French Italian Dutch Portuguese Bulgarian-70 Romanian127-

nlp.uned.es/QA/ave Evaluation  Not balanced collections  Approach: Detect if there is enough evidence to accept an answer  Measures: Precision, recall and F over ACCEPTED answers  Baseline system: Accept all answers

nlp.uned.es/QA/ave Evaluation GroupSystemFPrecisionRecall DFKIltqa_ DFKIltqa_ U. Alicanteofe_ Text-Mess ProjectText-Mess_ Iasiadiftene UNEDrodrigo Text-Mess ProjectText-Mess_ U. Alicanteofe_ % VALIDATED % VALIDATED Precision, Recall and F measure over correct answers for English

nlp.uned.es/QA/ave Comparing AV systems performance with QA systems (German) GroupSystem Type QA accuracy % of perfect selection Perfect selection QA % FUHiglockner_2 AV % FUHiglockner_1 AV % DFKI dfki071dedeQA % FUH fuha071dedeQA % Random AV % DFKI dfki071endeQA % FUH fuha072dedeQA % DFKI dfki071ptdeQA %

nlp.uned.es/QA/ave Techniques reported at AVE 2007  10 reports, all reported a RTE approach Generates hypotheses 6 Wordnet 3 Chunking 3 n-grams, longest common Subsequences 5 Phrase transformations 2 NER 5 Num. expressions 6 Temp. expressions 4 Coreference resolution 2 Dependency analysis 3 Syntactic similarity 4 Functions (sub, obj, etc) 3 Syntactic transformations 1 Word-sense disambiguation 2 Semantic parsing 4 Semantic role labeling 2 First order logic representation 3 Theorem prover 3 Semantic similarity 2

nlp.uned.es/QA/ave Conclusion  Evaluation in a real environment Real systems outputs -> AVE input  Developed methodologies Build collections from QA responses Evaluate in chain with a QA Track Compare results with QA systems  New testing collections for the QA and RTE communities In 7 languages, not only English

nlp.uned.es/QA/ave Conclusion  9 groups, 16 systems, 4 languages  All systems based on Textual Entailment  5 out of 9 groups participated in QA Introduction of RTE techniques in QA More NLP More Machine Learning  Systems based on syntactic or semantic analysis perform Automatic Hypothesis Generation Combination of the question and the answer Some cases directly in a logic form