Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

Similar presentations


Presentation on theme: "1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,"— Presentation transcript:

1 1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT, Italy) Álvaro Rodrigo (UNED, Spain) Richard Sutcliffe (U. Limerick, Ireland) Roser Morante (U. Antwerp, Belgium) Walter Daelemans (U. Antwerp, Belgium) Caroline Sporleder (U. Saarland, Germany) Corina Forascu (UAIC, Romania) Yassine Benajiba (Philips, USA) Petya Osenova (Bulgarian Academy of Sciences)

2 2 Question Answering Track at CLEF 2003200420052006200720082009201020112012 QA Tasks Multiple Language QA Main TaskResPubliQAQA4MRE Temporal restrictions and lists Answer Validation Exercise (AVE) Giki CLEF Negation and Modality Real Time QA over Speech Transcriptions (QAST) Biomedical WiQA WSD QA

3 Portrayal Along the years, we learnt that the architecture is one of the main limitations for improving QA technology So we bet on a reformulation: 3 Question Answer Question analysis Passage Retrieval Answer Extraction Answer Ranking 1.00.8 0.64xx=

4 Hypothesis generation + validation 4 Question Searching space of candidate answers Hypothesis generation functions + Answer validation functions Answer

5 We focus on validation … Is the candidate answer correct? QA4MRE setting: Multiple Choice Reading Comprehension Tests Measure progress in two reading abilities Answer questions about a single text Capture knowledge from text collections 5

6 … and knowledge Why capture knowledge from text collections? We need knowledge to understand language The ability of making inferences about texts is correlated to the amount of knowledge considered Texts always omit information we need to recover To build the complete story behind the document And be sure about the answer 6

7 Text as source of knowledge Text Collection (background collection) Set of documents that contextualize the one under reading (20,000-100,000 docs.) We can imagine this done on the fly by the machine Retrieval Big and diverse enough to acquire knowledge Define a scalable strategy: topic by topic Reference collection per topic

8 Background Collections They must serve to acquire General facts (with categorization and relevant relations) Abstractions (such as This is sensitive to occurrence in texts Thus, also to the way we create the collection Key: Retrieve all relevant documents and only them Classical IR Interdependence with topic definition The topic is defined by the set of queries that produce the collection 8

9 Example: Biomedical Alzheimer’s Disease Literature Corpus Search PubMed about Alzheimer Query: (((((("Alzheimer Disease"[Mesh] OR "Alzheimer's disease antigen"[Supplementary Concept] OR "APP protein, human"[Supplementary Concept] OR "PSEN2 protein, human"[Supplementary Concept] OR "PSEN1 protein, human"[Supplementary Concept]) OR "Amyloid beta-Peptides"[Mesh]) OR "donepezil"[Supplementary Concept]) OR ("gamma-secretase activating protein, human"[Supplementary Concept] OR "gamma-secretase activating protein, mouse"[Supplementary Concept])) OR "amyloid beta-protein (1- 42)"[Supplementary Concept]) OR "Presenilins"[Mesh]) OR "Neurofibrillary Tangles"[Mesh] OR "Alzheimer's disease"[All Fields] OR "Alzheimer's Disease"[All Fields] OR "Alzheimer s disease"[All Fields] OR "Alzheimers disease"[All Fields] OR "Alzheimer's dementia"[All Fields] OR "Alzheimer dementia"[All Fields] OR "Alzheimer-type dementia"[All Fields] NOT "non- Alzheimer"[All Fields] NOT ("non-AD"[All Fields] AND "dementia"[All Fields]) AND (hasabstract[text] AND English[lang]) 66,222 abstracts 9

10 Questions (Main Task) Distribution of question types 27 PURPOSE 30 METHOD 36 CAUSAL 36 FACTOID 31 WHICH-IS-TRUE Distribution of answer types 75 REQUIRE NO EXTRA KNOWLEDGE 46 REQUIRE BACKGROUND KNOWLEDGE 21 REQUIRE INFERENCE 20 REQUIRE GATHERING INFORMATION FROM DIFFERENT SENTENCES 10

11 Questions (Biomedical Task) Question types 1. Experimental evidence/qualifier 2. Protein-protein interaction 3. Gene synonymy relation 4. Organism source relation 5. Regulatory relation 6. Increase (higher expression) 7. Decrease (reduction) 8. Inhibition Answer types Simple: The answer is found almost verbatim in the paper Medium: The answer is rephrased Complex: Require combining pieces of evidence and inference They involve a predefined set of entity types 11

12 Main Task 16 test documents, 160 questions, 800 candidate answers 4 Topics 1.AIDS 2.Music and Society 3.Climate Change 4.Alzheimer (divulgative sources: blogs, web, news, …) 4 Reading tests per topic Document + 10 questions 5 choices per question 6 Languages English, German, Spanish, Italian, Romanian, Arabic new

13 Biomedical Task Same setting Scientific language Focus on one disease: Alzheimer Alzheimer's Disease Literature Corpus (ADLC) 66,222 abstracts from PubMed 9,500 full articles Most of them processed: Dependency parser GDep (Sagae and Tsujii 2007) UMLS-based NE tagger (CLiPS) ABNER NE tagger (Settles 2005)

14 Task on Modality and Negation Given an event in the text decide whether it is 1.Asserted (NONE: no negation and no speculation) 2.Negated (NEG: negation and no speculation) 3.Speculated but negated (NEGMOD) 4.Speculated and not negated (MOD) Is the event present as certain? YesNo Did it happen? Is it negated? YesNoYesNo NEGMODMODNONENEG

15 Participation 15 Task Registered groups Participant groupsSubmitted Runs Main251143 Biomedical23743 Modality and Negation336 Total512192 ~100% increase

16 Evaluation and results QA perspective evaluation c@1 over all questions (random 0.2) Reading perspective evaluation Aggregating results test by test (pass if c@1 > 0.5) 16 Best systems MainBest systems Biomedical 0.650.55 0.400.47 Best systems MainBest systems Biomedical Tests passed: 12 / 16Tests passed: 3 / 4 Tests passed: 6 /16

17 More details during the workshop Monday 17 th Sep. 17:00 - 18:00 Poster Session Tuesday 18 th Sep. 10:40 – 12:40 Invited Talk + Overviews 14:10 – 16:10 Reports from participants (Main + Bio) 16:40 – 17:15 Reports from participants (Mod&Neg) 17:15 – 18:10 Breakout session Thanks! 17


Download ppt "1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,"

Similar presentations


Ads by Google