Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Slides:

Advertisements

Similar presentations

Testing Relational Database

Advertisements

Chapter 5: Introduction to Information Retrieval

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.

WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.

The process of formulating responses remains

Introduction to Information Retrieval (Part 2) By Evren Ermis.

Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.

IR Models: Overview, Boolean, and Vector

Search Engines and Information Retrieval

Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.

CAP 252 Lecture Topic: Requirement Analysis Class Exercise: Use Cases.

NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.

Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.

T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.

The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.

Modeling Modern Information Retrieval

Chapter 11 Integration Information Instructor: Prof. G. Bebis Represented by Reza Fall 2005.

1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.

Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,

Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.

Chapter 5: Information Retrieval and Web Search

Overview of Search Engines

Modeling (Chap. 2) Modern Information Retrieval Spring 2000.

Query Relevance Feedback and Ontologies How to Make Queries Better.

CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.

Search Engines and Information Retrieval Chapter 1.

An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.

Going Beyond Simple Question Answering Bahareh Sarrafzadeh CS 886 – Spring 2015.

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Querying Structured Text in an XML Database By Xuemei Luo.

1 Just-in-Time Interactive Question Answering Language Computer Corporation Sanda Harabagiu, PI John Lehmann John Williams Paul Aarseth.

Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.

Chapter 6: Information Retrieval and Web Search

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

1 What is OO Design? OO Design is a process of invention, where developers create the abstractions necessary to meet the system’s requirements OO Design.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

AQUAINT June 2002 Workshop June 2002 Just-in-Time Interactive Question Answering Sanda Harabagiu: PI Language Computer Corporation.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.

1 Information Retrieval LECTURE 1 : Introduction.

HITIQA: Scenario Based Question Answering Tomek Strzalkowski, et al The State University of New York at Albany Paul Kantor, et al Rutgers University Boris.

AQUAINT IBM PIQUANT ARDACYCORP Subcontractor: IBM Question Answering Update piQuAnt ARDA/AQUAINT December 2002 Workshop This work was supported in part.

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.

Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.

An Ontological Approach to Financial Analysis and Monitoring.

Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

University of Malta CSA3080: Lecture 10 © Chris Staff 1 of 18 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.

Linguistic Graph Similarity for News Sentence Searching

Web News Sentence Searching Using Linguistic Graph Similarity

Presentation 王睿.

Chapter 5: Information Retrieval and Web Search

Probabilistic Databases

Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.

Chapter 12 Analyzing Semistructured Decision Support Systems

Information Retrieval and Web Design

CoXML: A Cooperative XML Query Answering System

Presentation transcript:

Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop

Abstract Combining multiple strategies that optimally resolve different question classes of various degrees of complexity Enhancing the precision of question interpretation and answer extraction Question decomposition and answer fusion

Introduction Our fundamental premise is that progress in Q/A cannot be achieved only by enhancing the processing components, but it also requires generating the best strategies for processing each individual question. Thus we believe that Q/A systems capable of successfully processing complex questions should employ multiple strategies instead of the current pipeline approach.

Q/A Systems Capable Pipeline architecture –Question processing –Passage retrieval –Answer selection Combining strategies for advanced QA –Knowledge-based Q/A implementation –Statistical noisy-channel algorithm for Q/A –Pattern-based approach that learn from Web

Optimal Strategies of Advanced QA Question Decomposition Answer Fusion Feedback from Interactive Q&A User Background Recognition

Instance (1/2) How have thefts impacted on the safety of Russia’s nuclear navy, and has the theft problem been increased or decreased over time? –What specific instances of theft do we know about? –What sort of items have been stolen?

Instance (2/2) The decompositions –Who are the perpetrators of these thefts? –Do thefts have an economical impact on the naval bases? The concepts need to be understood –What is meant by nuclear navy? –What does ‘impact’ mean?

Decomposition Criteria (1/4) 1. Decompositions along the constituents they coordinate –question stem level When and where did the thefts occur? –at predicate level How does one define an increase or a decrease in the theft problem? –at argument level To what degree do different thefts put nuclear or radioactive materials at risk? –at question level What specific instances of theft do we know about, and what are the sources of this information?

Decomposition Criteria (2/4) Question decomposition by identifying coordinations –disambiguation of conjunctives for identifying when they indicate separate questions as opposed to when they just coordinate constituents –reference and ellipsis resolution of anaphoric expressions in the original question –recognition of the relations between the resulting, decomposed questions

Decomposition Criteria (3/4) 2. The question asks about –a complex relation –comparison with similar situations –elaboration of a state of affairs Determines the decomposition into –definition question –specializations of the predicate-concept –examples

Decomposition Criteria (4/4) 3. Elaborations of its arguments –nested predicate-argument structures –quantifications –instantiations

Large Database The pairs need to be diverse in terms of difficulty,where difficulty can be defined in terms of answer type complexity,answer granularity,ease of matching. The pairs should be reliable, i.e. each question must be associated with a correct answer.

Our Solution Combination of collection and generation from semi-structured resources, followed by expansion and validation. Generate the collection of QA pairs from Frequently Asked Questions (FAQ) files on various topics. Develop a dedicated harvesting algorithm to identify FAQ's on the Web and extract the QA pairs.

Answer Resolution Strategies

Overview Answer Funsion, Ranking and Reliability Bootstrapping Question Answering User Background Processing Negation in Question Answering Conclusions

Answer Fusion, Ranking and Reliability An open-domain, template-based answer formalization A probabilistic model A set of template merging operators

Open-domain Template Representation

Detection of Template Relations A novel matching approach based on template attributes that support relation detection for merging. The approach combines phrasal parsing, lemma normalization and semantic approximation

Fusion Operators

Answer Fusion Block Architecture

Bootstrapping Question Answering What weapons of mass destruction (WMD) does Iraq have? –answer type “WMD” –accepts concepts such as “anthrax” Exact answer –LCC’s system –answer type (AT)

Answer Instance Bootstrapping Algorithm

Example What viral agent was used in Iraq? –If the answer type concept does not exist in WordNet the bootstrapping algorithm will create a distinct category for this concept. –If the answer type concept exists in WordNet the algorithm attaches the bootstrapped entities and patterns to the concept hypernym that provides the largest coverage without overlapping any other known categories

Enhancing Retrieval, Navigation, And Fusion (biological AND agents AND Qaeda) –What biological agents does al Qaeda possess? The extensions to the AT ontology, enable an intelligent query expansion –AT instances –extraction patterns

Expanded Query ((biological AND agents) OR (bacterial AND agent) OR (viral AND agent) OR (fungal AND agent) OR (toxic AND agent) OR botulism OR botulinum OR smallpox OR encephalitis OR (deploy)) AND (Qaeda) the conversion of extraction patterns into keywords –“deploy” for "deploy ANY-WMD“ the controlled expansion through selective keyword selection –for “biological agents”

Continuous Updating of Scenario Knowledge

User Background all users are different: not only do they have different backgrounds and expertise, but they also vary in their goals and reasons for using a Q/A system.

Different user selections from the generated question decomposition tree

Assessing User Background We evaluate users via a discrete evaluation scale, which ranks users as novice, casual, or expert users based on how much background knowledge they have on the given topic For example, if the user is known to be an “expert”, only the paths generated through “expert” decomposition - i.e. generated using significant world and topic knowledge – will be followed.

Representing User Background Traditionally, the user profile has been represented as a term vector P = ((t 1, w 1 ), (t 2, w 2 ), …, (t n, w n )) –each profile P –t i are terms from relevant documents –w i are term weights, typically computed with the tf * idf metric.

Two Regards Common for one user to explore multiple topics even during the same session –P i = ((t i1, w i1 ), (t i2, w i2 ), …(t im, w im )), i = 1, 2, …, n, and m is the size of vector p i. When a new document is marked as relevant –merged with an existing profile, if their similarities are higher than a given threshold –Used to generate a new profile

The Expert User’s Background Multiple vectors Each specializes on a clear Domain-specific direction

Second Innovation “al” is among the most frequent terms, but, by itself, “al” is considered a stop word by most information retrieval systems However, the significance of the term becomes evident when the complete concept, “al Qaeda” is considered.

Processing Negation In Question Answering Previous Q/A systems –filtering the retrieved answer –eliminating answers Examples –Which countries did not vote for the Iraq war resolution in the Security Council? –Which countries did not provide help to the coalition during the Gulf War in 1991? –What planets have no moon?

Recognizing The Most Frequent Cases of Negation no with no terrorists, the world would be safer nothing the inspectors found nothing thefts did not occur at the beginning the president never leaves the White House without the Secret Service approval

Hypotheses we assume that when a speaker is formulating a question to find out whether a proposition is true or false, s/he formulates the question with the form of the proposition which would be the most informative if it turned out to be true We expect that if a question has the form of negation, the speaker believes that the negative answer is the most informative

Negation Be Addressed In Q/A By using the user background By interacting with the user By finding cues from the answers to the positive question

Conclusions Question decompositions following several criteria Answer fusion which composes a unique, coherent answer from the partial answers extracted for each decomposed question Modeling of user background Processing of negation in questions and/or answers Bootstrapping algorithm that enhances the precision of factual Q/A