1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

ENGLISH A1. DEAR STUDENTS, TO UNIT 2 DURING UNIT 2 YOU WILL STUDY THE FOLLOWING TOPICS: TELLING THE TIME CAN (FOR ABILITIES) SIMPLE PRESENT PRESENT PROGRESSIVE.

Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner.

QA and Language Modeling (and Some Challenges) Eduard Hovy Information Sciences Institute University of Southern California.

1 CLEF 2011, Amsterdam QA4MRE, Question Answering for Machine Reading Evaluation Question Answering Track Overview Main Task Anselmo Peñas Eduard Hovy.

Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.

CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.

1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.

Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.

Search Engines and Information Retrieval

3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008 UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks.

Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.

Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.

CLEF 2007 Multilingual Question Answering Track Danilo Giampiccolo, CELCT Anselmo Peñas, UNED.

B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego

Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.

Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,

Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.

CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.

CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.

August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.

Search Engines and Information Retrieval Chapter 1.

Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.

CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.

CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.

Author(s) (Name of student) and their Affiliation (Department/Course/Club, School Name and Address) FUTURE DIRECTIONS RESULTS: ANALYSIS AND IMPLICATIONS.

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

GRE Test Preparation Workshop for Campus Educators Preparing for the Verbal Measure.

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.

Flexible Text Mining using Interactive Information Extraction David Milward

Evaluating Question Answering Validation Anselmo Peñas (and Alvaro Rodrigo) NLP & IR group UNED nlp.uned.es Information Science Institute Marina del Rey,

1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)

Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.

Evaluating Multilingual Question Answering Systems at CLEF Pamela Forner 1, Danilo Giampiccolo 1, Bernardo Magnini 2, Anselmo Peñas 3, Álvaro Rodrigo 3,

A Language Independent Method for Question Classification COLING 2004.

Lesson Plan Project by Jill Keeve. Goal/Objective Goal : Students will use a reading excerpt to explore alternate background information on conic sections.

Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst.

BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™

Search Engine Architecture

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.

BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.

Thomas Mandl: GeoCLEF Track Overview Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.

Information Retrieval

Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

RESEARCH An Overview A tutorial PowerPoint presentation by: Ramesh Adhikari.

The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.

1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.

Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.

GRE READING COMPREHENSION. READING COMPREHENSION PASSAGE STRUCTURES Three Classic GRE Passage Structures Arguing a Position Discussing something specific.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Getting Started: Research and Literature Reviews An Introduction.

UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.

NAEP READING FOR 2009 Michael L. Kamil Stanford University.

BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.

READING COMPREHENSION

National 3 Course Torry Academy.

CSE 635 Multimedia Information Retrieval

What is the Entrance Exams Task

UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to…

Machine Reading.

CLEF 2008 Multilingual Question Answering Track

Presentation transcript:

1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT, Italy) Álvaro Rodrigo (UNED, Spain) Richard Sutcliffe (U. Limerick, Ireland) Roser Morante (U. Antwerp, Belgium) Walter Daelemans (U. Antwerp, Belgium) Caroline Sporleder (U. Saarland, Germany) Corina Forascu (UAIC, Romania) Yassine Benajiba (Philips, USA) Petya Osenova (Bulgarian Academy of Sciences)

2 Question Answering Track at CLEF QA Tasks Multiple Language QA Main TaskResPubliQAQA4MRE Temporal restrictions and lists Answer Validation Exercise (AVE) Giki CLEF Negation and Modality Real Time QA over Speech Transcriptions (QAST) Biomedical WiQA WSD QA

Portrayal Along the years, we learnt that the architecture is one of the main limitations for improving QA technology So we bet on a reformulation: 3 Question Answer Question analysis Passage Retrieval Answer Extraction Answer Ranking xx=

Hypothesis generation + validation 4 Question Searching space of candidate answers Hypothesis generation functions + Answer validation functions Answer

We focus on validation … Is the candidate answer correct? QA4MRE setting: Multiple Choice Reading Comprehension Tests Measure progress in two reading abilities Answer questions about a single text Capture knowledge from text collections 5

… and knowledge Why capture knowledge from text collections? We need knowledge to understand language The ability of making inferences about texts is correlated to the amount of knowledge considered Texts always omit information we need to recover To build the complete story behind the document And be sure about the answer 6

Text as source of knowledge Text Collection (background collection) Set of documents that contextualize the one under reading (20, ,000 docs.) We can imagine this done on the fly by the machine Retrieval Big and diverse enough to acquire knowledge Define a scalable strategy: topic by topic Reference collection per topic

Background Collections They must serve to acquire General facts (with categorization and relevant relations) Abstractions (such as This is sensitive to occurrence in texts Thus, also to the way we create the collection Key: Retrieve all relevant documents and only them Classical IR Interdependence with topic definition The topic is defined by the set of queries that produce the collection 8

Example: Biomedical Alzheimer’s Disease Literature Corpus Search PubMed about Alzheimer Query: (((((("Alzheimer Disease"[Mesh] OR "Alzheimer's disease antigen"[Supplementary Concept] OR "APP protein, human"[Supplementary Concept] OR "PSEN2 protein, human"[Supplementary Concept] OR "PSEN1 protein, human"[Supplementary Concept]) OR "Amyloid beta-Peptides"[Mesh]) OR "donepezil"[Supplementary Concept]) OR ("gamma-secretase activating protein, human"[Supplementary Concept] OR "gamma-secretase activating protein, mouse"[Supplementary Concept])) OR "amyloid beta-protein (1- 42)"[Supplementary Concept]) OR "Presenilins"[Mesh]) OR "Neurofibrillary Tangles"[Mesh] OR "Alzheimer's disease"[All Fields] OR "Alzheimer's Disease"[All Fields] OR "Alzheimer s disease"[All Fields] OR "Alzheimers disease"[All Fields] OR "Alzheimer's dementia"[All Fields] OR "Alzheimer dementia"[All Fields] OR "Alzheimer-type dementia"[All Fields] NOT "non- Alzheimer"[All Fields] NOT ("non-AD"[All Fields] AND "dementia"[All Fields]) AND (hasabstract[text] AND English[lang]) 66,222 abstracts 9

Questions (Main Task) Distribution of question types 27 PURPOSE 30 METHOD 36 CAUSAL 36 FACTOID 31 WHICH-IS-TRUE Distribution of answer types 75 REQUIRE NO EXTRA KNOWLEDGE 46 REQUIRE BACKGROUND KNOWLEDGE 21 REQUIRE INFERENCE 20 REQUIRE GATHERING INFORMATION FROM DIFFERENT SENTENCES 10

Questions (Biomedical Task) Question types 1. Experimental evidence/qualifier 2. Protein-protein interaction 3. Gene synonymy relation 4. Organism source relation 5. Regulatory relation 6. Increase (higher expression) 7. Decrease (reduction) 8. Inhibition Answer types Simple: The answer is found almost verbatim in the paper Medium: The answer is rephrased Complex: Require combining pieces of evidence and inference They involve a predefined set of entity types 11

Main Task 16 test documents, 160 questions, 800 candidate answers 4 Topics 1.AIDS 2.Music and Society 3.Climate Change 4.Alzheimer (divulgative sources: blogs, web, news, …) 4 Reading tests per topic Document + 10 questions 5 choices per question 6 Languages English, German, Spanish, Italian, Romanian, Arabic new

Biomedical Task Same setting Scientific language Focus on one disease: Alzheimer Alzheimer's Disease Literature Corpus (ADLC) 66,222 abstracts from PubMed 9,500 full articles Most of them processed: Dependency parser GDep (Sagae and Tsujii 2007) UMLS-based NE tagger (CLiPS) ABNER NE tagger (Settles 2005)

Task on Modality and Negation Given an event in the text decide whether it is 1.Asserted (NONE: no negation and no speculation) 2.Negated (NEG: negation and no speculation) 3.Speculated but negated (NEGMOD) 4.Speculated and not negated (MOD) Is the event present as certain? YesNo Did it happen? Is it negated? YesNoYesNo NEGMODMODNONENEG

Participation 15 Task Registered groups Participant groupsSubmitted Runs Main Biomedical23743 Modality and Negation336 Total ~100% increase

Evaluation and results QA perspective evaluation over all questions (random 0.2) Reading perspective evaluation Aggregating results test by test (pass if > 0.5) 16 Best systems MainBest systems Biomedical Best systems MainBest systems Biomedical Tests passed: 12 / 16Tests passed: 3 / 4 Tests passed: 6 /16

More details during the workshop Monday 17 th Sep. 17: :00 Poster Session Tuesday 18 th Sep. 10:40 – 12:40 Invited Talk + Overviews 14:10 – 16:10 Reports from participants (Main + Bio) 16:40 – 17:15 Reports from participants (Mod&Neg) 17:15 – 18:10 Breakout session Thanks! 17