Presentation on theme: "Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner."— Presentation transcript:
Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT, Italy) Richard Sutcliffe (U. Limerick, Ireland) Álvaro Rodrigo (UNED, Spain)
UNED nlp.uned.es Knowledge-Understanding dependence We “understand” because we “know” Capture ‘knowledge’ expressed in texts ‘Understand’ language
UNED nlp.uned.es Control the variable of knowledge The ability of making inferences about texts is correlated to the amount of knowledge considered This variable has to be taken into account during evaluation Otherwise is very difficult to compare methods How to control the variable of knowledge in a reading task?
UNED nlp.uned.es Question Answering Restricted-domain QA systems 1. On large knowledge bases Structured QA, not aiming for language understanding 2. On a domain specific collection Information Extraction rules Open domain QA systems 1. On open domain collections Based on retrieval and redundancy Very limited inference What’s next in QA?
UNED nlp.uned.es Recognizing Textual Entailment Test: Text (evidence) – Hypothesis pair Source of knowledge: Free Difficult to evaluate if best systems have better methods or better knowledge or both Cheap evaluation Reusable 100% Same framework for any level of complexity What´s next in RTE? Control the variable of knowledge
UNED nlp.uned.es Proposal: QA4MRE Question Answering for Machine Reading Evaluation (QA4MRE) New task of QA Track at CLEF 2011 General Goal Measure progress in two reading abilities Capture knowledge from text collections Answer questions about a single text
UNED nlp.uned.es Requirements Don’t fix the representation formalism Semantic representation beyond sentence level is part of the research agenda Don't build systems tuned for specific domains But general technologies, able to self-adapt to new contexts or topics Evaluate reading abilities Knowledge acquisition Answer questions about a single document Control the role of knowledge
UNED nlp.uned.es Sources of knowledge Text Collection Big and diverse enough to acquire required knowledge Impossible for all possible topics Define a scalable strategy: topic by topic Several topics Narrow enough to limit knowledge needed (e.g. Petroleum industry, European Football League, Disarmament of the Irish Republican Army, etc.) Reference collection per topic (10,000-50,000 docs.) Documents defining concepts about the topic (e.g. wikipedia) News about the topic Web pages, blogs, opinions
UNED nlp.uned.es Reading test Text Coal seam gas drilling in Australia's Surat Basin has been halted by flooding. Australia's Easternwell, being acquired by Transfield Services, has ceased drilling because of the flooding. The company is drilling coal seam gas wells for Australia's Santos Ltd. Santos said the impact was minimal. Multiple choice test According to the text… What company owns wells in Surat Basin? a)Australia b)Coal seam gas wells c)Easternwell d)Transfield Services e)Santos Ltd. f)Ausam Energy Corporation g)Queensland h)Chinchilla
UNED nlp.uned.es Knowledge gaps Company A Company BWell C fordrills Owns | P Acquire this knowledge from the reference collection Queensland Australia Surat Basin is part of
UNED nlp.uned.es Runs Type I No external sources of knowledge Only the given reference collection Type II With external sources Specify which ones.
UNED nlp.uned.es Schedule Guidelines and Samples of tests1 st February Release of topics and reference corpora1 st April Test set release1 st June Run submissions15 th June Results to the participants1 st July Submission of Notebook PapersAugust - September Web site:
UNED nlp.uned.es Program Committee Ken Barker, University of Texas at Austin, US Johan Bos, Rijksuniversiteit Groningen, Netherlands Peter Clark, Vulcan Inc., US Ido Dagan, Bar-Ilan University, Israel Bernardo Magnini, Fondazione Bruno Kessler, Italy Dan Moldovan, University of Texas at Dallas, US Emanuele Pianta, Fondazione Bruno Kessler, and CELCT, Italy John Prager, IBM, US Dan Tufis, RACAI, Romania Hoa Trang Dang, NIST, US
UNED nlp.uned.es Join the organization Working group is open to collaboration Development collections Add new languages Define types of questions Write down tests about a topic …