Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Chapter 5: Introduction to Information Retrieval
Multimedia Database Systems
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Search Engines and Information Retrieval
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Information Retrieval
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Information Retrieval in Practice
CS653 INFORMATION RETRIEVAL Overview. Outline 2  Topics to be covered in this class: Query Suggestions Question Answering Recommendation Systems Web.
Search Engines and Information Retrieval Chapter 1.
Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
1 LiveClassifier: Creating Hierarchical Text Classifiers through Web Corpora Chien-Chung Huang Shui-Lung Chuang Lee-Feng Chien Presented by: Vu LONG.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
A Language Independent Method for Question Classification COLING 2004.
Chapter 6: Information Retrieval and Web Search
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Search Engine Architecture
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Chapter 23: Probabilistic Language Models April 13, 2004.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Search Tools and Search Engines Searching for Information and common found internet file types.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
A code-centric cluster-based approach for searching online support forums for programmers Christopher Scaffidi, Christopher Chambers, Sheela Surisetty.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
WP4 Models and Contents Quality Assessment
Definition “Information Retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information.
Information Retrieval and Web Search
Information Retrieval and Web Search
Information Retrieval and Web Design
Introduction to Search Engines
Presentation transcript:

Question Answering

 Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse areas of study, e.g., IR, NLP, Onto, and ML, to identify users’ info. needs & textual phrases potentially suitable answers for users  Exploit Data from Community Question Answering Systems (CQA) (Web) Data Sources, i.e., doc corpus 2

Question Answering (QA) n Question answering (QA) is a specialized form of IR n Given a collection of documents/collaborative QA system, the QA system attempts to retrieve correct answers to questions posted in natural language n Unlike search engines, QA systems generate answers instead of providing ranked lists of documents n Current (non-collaborative) QA systems extract answers from large corpora such as the Web n Fact-based QA limits range of informational questions to those with simple, short answers  who, where, why, what, when, how (5W 1H/WH) questions 3

 CQA-based approaches  Analyze questions (& corresponding answers) archived at CQA sites to locate answers to a newly-created question  Exploit “wealth-of-knowledge” already provided by CQA users 4 Question Answering  Existing popular CQA sites Yahoo! Answers, WikiAnswers, and StackOverflow Community Question Answering System CQA-Based

5  Example. Question Answering CQA-Based

 Challenges for finding an answer to a new question from QA pairs archived at CQA sites 6 Misleading Answers No Answers Spam Answers SPAM Incorrect Answers Question Answering Answerer reputation CQA-Based

7 Question Answering  Challenges (cont.) 300 millions posted under Yahoo! Answers since 2005: an average of 7,000 questions & 21,000 answers per hour Identifying the most suitable answer among the many available Account for the fact that questions referring to the same topic might be formulated using similar, but not the same, words CQA-Based

n Matching posted questions to the best answerers who can contribute the needed information  Based on the expertise/past performance of the answerers who have answered similar questions  (Problem) Are the potential answerers willing to accept & answer the questions recommended to them on time? When do users tend to answer questions in a CQA system? How do users tend to choose the questions to answers in CQA? n (A solution) Analyze the answering behavior of answerers  When: Analyze the overall/user-specific temporal activity patterns & identify stable daily/weekly periodicities  How: Analyze factors that affect users’ decision, including question category, question positions, & question text 8 Question Answering CQA-Based

n Applying a question-routing scheme that considers the answering, commenting & voting propensities of a group of answerers  (Question) What routing strategy should be employ to ensure that a question gets answers with lasting value?  (A Solution) Answerers collaborate to answer questions, who are chosen according to their compatibility, topical expertise & availability, to offer answers with high values QA process is a collaborative effort that requires inputs from different types of users User-user compatibility is essential in CQA services Evaluating topics, expertise & availability are critical in building the framework for achieving the goal of a CQA system 9 Question Answering CQA-Based

n Increasing the participation of expert answerers by using a question recommendation system to proactively warn answerers the presence of suitable questions to answer  (How?) Using community feedback tools, which serve as a crowd-sourced mechanism Users can vote, positively or negatively, for questions or answers, which are casted into a single score & serve as a proxy for question/answer quality  (Another Solution) Using the present of text (in questions & answers) for modeling the experts & the questions. Users & questions are represented as vectors of latent features Users with expertise in similar topics are likely to answer similar questions, which can be recommended to expert users 10 Question Answering CQA-Based

 Corpus-based approaches  Analyze text documents from diverse online sources to locate answers that satisfy the info. needs expressed in a question  Overview QA SYSTEM “When is the next train to Glasgow?” Question “8:35, Track 9.” Answer Text Corpora & RDBMS Data sources Question Extract Keywords Query Search Engine Corpus Docs Passage Extractor Answers Answer Selector Answer 11 Question Answering Corpus-based

 Classification: 12 Question Answering Corpus-based  Challenges Factoid vs. List (of factoids) vs. Definition Open vs. Closed domain “What lays blue eggs?” -- one fact “Name 9 cities in Europe” -- multiple facts “What is information retrieval? -- textual answer Identifying actual user’s information needs Converting to quantifiable measures Answer ranking “What is apple?” “Magic mirror in my hand, who is the fairest in the land?”

Corpus-Based QA Systems n Corpus-based QA systems rely on a collection of docs, attempting to retrieve correct answers to questions posed in natural languages 13

Question Answering n Question Processing Module: Given a question Q as input, the module process, analyzes, creates a representation of the information requested in Q, and determines  The question type (such as informational) based on a taxonomy of possible questions already coded into the system, e.g., Who: asking for people Where: referring to places/locations When: looking for time/occasion Why: obtaining an explanation/reason What: requesting specific information How: describing the manner that something is done  The expected answer type through semantic processing of Q  The question focus, which represents the main information that is required to answer Q 14

15 Sample types of questions, their corresponding answer types, and statistics from the set of TREC 8 questions

Question/Answer Classification n Question Type Classification: provide constraints on what constitutes relevant data, the nature of the answer  Using Support Vector Machines (SVM) to classify Q based on feature sets, e.g., text (a bag of words) or semantic (named entities) features, e.g., proper names/adjectives n Answer Type Classification: mapping question typeto answer types can be a one-to-many mapping, since question classification can be ambiguous, e.g., what n Question Focus: is defined as a word or a sequences of words indicating what info. is being asked in Q, e.g.,  “What is the longest river in New South Wales” has the focus on “longest river” in the question type of ‘what’  Using pattern matching rules to identify the question focus 16

Question Answering n Paragraph Indexing Module (or Document Processing Module) relies on one or more IR systems to gather info. from a collection of document corpora  Filter paragraphs, retaining non-stop, stemmed words  Perform indexing on remaining keywords in paragraphs  Access the quality of indexed (keywords in) paragraphs & order the extracted paragraphs according to how plausible they contain answers to questions (e.g., based on the question keywords in the paragraphs) 17

Question Answering n Answer Processing Module is responsible for identifying & extracting answers from paragraphs passed to it  Answer Identification determines paragraphs which contain the required answer type based on named entity recognition/part-of-speech tagger to recognize answers  Answer Extraction retrieves relevant words/phrases in answers to the given question  Answer Correctness can be verified by the confidence in the correctness of an answer based on the lexical analysis (using WordNet?) on the correct answer type Types of answers to questions & questions are in the same domain 18