RTE Planning Session Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo.

Slides:



Advertisements
Similar presentations
CHROMOSOMES AND CELL REPRODUCTION
Advertisements

Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:
Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner.
Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.
1. Review- What is Science Explain- What kinds of understandings does science contribute about the natural world Form an Opinion- Do you think that scientists.
1 CLEF 2011, Amsterdam QA4MRE, Question Answering for Machine Reading Evaluation Question Answering Track Overview Main Task Anselmo Peñas Eduard Hovy.
Chapter 10.2 Cell Division.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
Scientific method - 1 Scientific method is a body of techniques for investigating phenomena and acquiring new knowledge, as well as for correcting and.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Measuring Learning Outcomes Evaluation
Science Inquiry Minds-on Hands-on.
Overview of the Fourth Recognising Textual Entailment Challenge NIST-Nov. 17, 2008TAC Danilo Giampiccolo (coordinator, CELCT) Hoa Trang Dan (NIST)
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
Overview. What will it look like? Item TypeBiology EOC Multiple Choice30-34 Completion1-5 Short Answer5 Total Items40 Total Points45 Pilot Items5  5-6.
Machine Reading as a Process of Partial Question-Answering Peter Clark and Phil Harrison Boeing Research & Technology June 2010.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Research Methods in Psychology (Pp 1-31). Research Studies Pay particular attention to research studies cited throughout your textbook(s) as you prepare.
+ What are the main events of the cell cycle? Sections 8.4 thru 8.6.
Building and Managing Human Resources
10.2: Mitosis.
Assessing the Impact of Frame Semantics on Textual Entailment Authors: Aljoscha Burchardt, Marco Pennacchiotti, Stefan Thater, Manfred Pinkal Saarland.
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
Artificial intelligence project
ELA Common Core Shifts. Shift 1 Balancing Informational & Literary Text.
Big Idea 1: The Practice of Science Description A: Scientific inquiry is a multifaceted activity; the processes of science include the formulation of scientifically.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant,
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
A Language Independent Method for Question Classification COLING 2004.
Recognizing textual entailment: Rational, evaluation and approaches Source:Natural Language Engineering 15 (4) Author:Ido Dagan, Bill Dolan, Bernardo Magnini.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Using the Right Method to Collect Information IW233 Amanda Murphy.
More On Cell Division Annotated for 5-E Learning Cycle Engage Explore Explain Elaborate Evaluate.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Supporting Researchers and Institutions in Exploiting Administrative Databases for Statistical Purposes: Istat’s Strategy G. D’Angiolini, P. De Salvo,
Splitting Complex Temporal Questions for Question Answering systems ACL 2004.
C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.
Cell Division-Proliferation Cell Division: An overview.
Section 9.2. Your genetic material exists as a mass of very long fibers that are too thin to be seen under a light microscope. These fibers consist of.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
TEFL METHODOLOGY I COMMUNICATIVE LANGUAGE TEACHING.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Towards Entailment Based Question Answering: ITC-irst at Clef 2006 Milen Kouylekov, Matteo Negri, Bernardo Magnini & Bonaventura Coppola ITC-irst, Centro.
Bell Work What is the difference between mitosis and meiosis?
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks EMNLP 2008 Rion Snow CS Stanford Brendan O’Connor Dolores.
BIOLOGY 11 IB 2.5: CELL DIVISION. ASSESSMENT STATEMENTS 2.5.1Outline the stages in the cell cycle, including interphase (G 1, S, G 2 ), mitosis, and cytokinesis.
Annotating and measuring Temporal relations in texts Philippe Muller and Xavier Tannier IRIT,Université Paul Sabatier COLING 2004.
Prepared by Ahmad Saleh Aljohani To Dr.Antar Abdellah.
Chapter 4 Program Development. Health Promotion Program Development After completion of the needs assessment and the mission statement it is time to develop.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Investigate Plan Design Create Evaluate (Test it to objective evaluation at each stage of the design cycle) state – describe - explain the problem some.
OpenACS and.LRN Conference 2008 Automatic Limited-Choice and Completion Test Creation, Assessment and Feedback in modern Learning Processes Institute for.
The Cell Cycle and Mitosis
Chapter 3, Lesson 1 Vocabulary Words
Improving a Pipeline Architecture for Shallow Discourse Parsing
The Steps into creation of research
Recognizing Partial Textual Entailment
Text-to-Text Generation
What is the Entrance Exams Task
Machine Reading.
Presentation transcript:

RTE Planning Session Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo

Discussion items What’s done so far: RTE 1-7 What’s next: what, where, when?  open discussion and audience feedback

Where we have got to 7 years of RTE challenges (sponsored by PASCAL – finishing in 2011) -RTE 1-5: balanced data sets based on the output of NLP applications -RTE 6-7: moving toward more realistic scenarios - Main task: TE performed against a real corpus, focused on SUM setting (after experimentation in RTE-5 pilot) -KBP Validation experiment  Considerable amount of datasets have been created in 7 RTE campaigns

What next? SUM and IE (KBP) have been already investigated in RTE-6 and RTE-7 Proposal: Investigate the potentialities of RTE systems for another NLP application setting

What next? RTE-8 will not be at TAC 2012 – Co-locate with a major conference to get wider engagement with the NLP community – NIST will continue to support the activities and contribute to the organization of challenges No RTE-8 in to allow the shift to an earlier time in the year -to prepare datasets for a new setting

Future directions for RTE: new NLP application scenarios QA appears to be the most natural direction – open domain, unsupervised setting Possible QA scenarios: Answer Validation 1.QA4MRE scenario 2.QA from Textbooks scenario 3.AVE on traditional QA tracks data

Answer Validation deciding whether an answer is correct or not according to a given text AV as a Textual Entailment problem: – H: question+answer (turned into a declarative sentence) – T is the text supporting the answer – T entails H = the answer is correct according to the supporting text

AV Input: Question: Which is the capital of Croatia? Answer: Zagreb Text: The capital of Croatia, Zagreb, has a population of around 700,000 citizens and it is known for … RTE Input: 1) T: Text (The capital of Croatia, Zagreb, has a population...) H: Q + A (Zagreb is the capital of Croatia) => H created manually or with automatic tools 2) Original AV triplet: => Requires automatic H generation Answer validation – An Example

1. The QA4MRE scenario Focuses on the Validation step of the QA pipeline – Formulated as a multiple choice reading comprehension test Questions about 1 given text Candidate answers provided – + Reference collection of documents to allow systems to acquire the same background knowledge used to assist with answering some questions End of the roadmap: full QA setting

Text Coal seam gas drilling in Australia's Surat Basin has been halted by flooding. Australia's Easternwell, being acquired by Transfield Services, has ceased drilling because of the flooding. The company is drilling coal seam gas wells for Australia's Santos Ltd. Santos said the impact was minimal. Multiple Choice Test According to the text… What company owns wells in Surat Basin? 1.Australia 2.Coal seam gas wells 3.Transfield Services 4.Santos Ltd. 5.Ausam Energy corporation QA4MRE Reading Test

T(ext) Coal seam gas drilling in Australia's Surat Basin has been halted by flooding. Australia's Easternwell, being acquired by Transfield Services, has ceased drilling because of the flooding. The company is drilling coal seam gas wells for Australia's Santos Ltd. Santos said the impact was minimal. Hs (Q + given A) 1.A USTRALIA owns wells in Surat Basin (NO ENTAILMENT) 2.C OAL SEAM GAS wells owns wells in Surat Basin (NO ENTAILMENT) 3.T RANSFIELD S ERVICES owns wells in Surat Basin (NO ENTAILMENT) 4.S ANTOS L TD. owns wells in Surat Basin (ENTAILMENT) 5.A USAM E NERGY C ORPORATION owns wells in Surat Basin (NO ENTAILMENT) QA4MRE-based RTE task

Interesting data: – questions are posed so that various kinds of textual inferences could be requested ( lexical, syntactic, discourse ) Available datasets: – 2011: up to 600 Hs 12 reading tests, 120 questions, 600 options – The task will be proposed 2012 When full QA setting => AV of QA4MRE systems

2. QA from a Textbook (eg., Biology) Textbooks as natural source of Q&A pairs: T = a paragraph / chapter / book Hs = revision/test questions from teachers and/or the end of the chapter: – True/false questions – Turn «find-a-value» questions into declarative sentences A natural and interesting challenge – established task, ready supply of data

T(ext) – from Biology textbook …Normally, the genetic material in the nucleus is in a loosely bundled coil called chromatin. At the onset of prophase, chromatin condenses together into a highly ordered structure called a chromosome. Since the genetic material has already been duplicated earlier in S phase, the replicated chromosomes have two sister chromatids, bound together at the centromere by the cohesin protein complex…. Hs Which of the following statement(s) are true? a.Genetic material is duplicated during prophase (NO ENTAILMENT) b.During prophase, chromosomes form from chromatin. (ENTAILMENT) c.S phase follows prophase. (NO ENTAILMENT) d.Chromatin is a form of genetic material. (ENTAILMENT) e.Cohesin keep the sister chromatid pairs connected with each other (ENTAILMENT) QA from a Textbook (cont.) Example (Biology)

3. AVE on «traditional» QA data Answer Validation Exercise (Peñas et al., 2006) – Validating the correctness of answers given by QA systems, according to the supporting documents returned by the systems. – Like RTE 6-7 KBP Validation Task Data available from past QA campaigns (TREC & CLEF)

Pilot task: RTE on Specialized Datasets Possible pilot task using specialized datasets, where all T-H pairs contain one or more specific phenomena that affect inference: – Temporal expressions – Numerical expressions  Focus on temporal and quantitative reasoning

TE-related initiatives for 2012: -Task # 6: Semantic Textual Similarity -Task # 8: Cross-Lingual Textual 2012: -QA4MRE

POSSIBLE VENUES FOR RTE-8 IN 2013 Semantics conferences are trying to join their efforts: *Sem 2012 – The first joint conference on lexical and computational semantics – Co-located with NAACL-HLT 2012 PROPOSAL: co-locate RTE-8 with ?Siglex NAACL-HLT or ACL (summer 2013) ?IWCS (winter or spring 2013)

Thank you See you all in 2013!