Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Slides:



Advertisements
Similar presentations
StopPreviousNext Vicnet Internet training course Workbook 10 Websites in your language The Internet Интернет ا Το ιαδίκτυο Easy English workbook July 2010.
Advertisements

FOR PROFESSIONAL OR ACADEMIC PURPOSES September 2007 L. Codina. UPF Interdisciplinary CSIM Master Online Searching 1.
CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Yi Wang CTO, Data Services Ivy Li Director, Equity Data Collection
By Constanza Lermanda G.. Topic: Electronic NewspaperDuration of the lesson: 50 minutesGrade Level: 12th.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Topics we will discuss tonight: 1.Introduction to Google Adwords platform 2.Understanding how to text ads are used. Display advertising will not be discussed.
ACM CIKM 2008, Oct , Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.
Teaching Using the Internet in Your Classroom.
How Search Works An Introduction. What Does Google Do When You Search? Search the index: When you click the Google Search button, Google races through.
“How Can Research Help Me?” Please make SURE your notes are similar to what I have written in mine.
ASSESSING THE NATIONAL ENTRIES Brian Walker
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Michigan Common Core Standards
Website Content, Forms and Dynamic Web Pages. Electronic Portfolios Portfolio: – A collection of work that clearly illustrates effort, progress, knowledge,
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Search Engines and Information Retrieval Chapter 1.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Steps Toward an AGI Roadmap Włodek Duch ( Google: W. Duch) AGI, Memphis, 1-2 March 2007 Roadmaps: A Ten Year Roadmap to Machines with Common Sense (Push.
Search Engines. Internet protocol (IP) Two major functions: Addresses that identify hosts, locations and identify destination Connectionless protocol.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Meta-Knowledge Computer-age study skill or What kids need to know to be effective students Graham Seibert Copyright 2006.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
A Language Independent Method for Question Classification COLING 2004.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Chapter 8 Collecting Data with Forms. Chapter 8 Lessons Introduction 1.Plan and create a form 2.Edit and format a form 3.Work with form objects 4.Test.
Presenter: Shanshan Lu 03/04/2010
Planning an Applied Research Project Chapter 3 – Conducting a Literature Review © 2014 by John Wiley & Sons, Inc. All rights reserved.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Research Topics CSC Parallel Computing & Compilers CSC 3990.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
WebFOCUS Magnify: Search Based Applications Dr. Rado Kotorov Technical Director of Strategic Product Management.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Information Retrieval
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
HITIQA: Scenario Based Question Answering Tomek Strzalkowski, et al The State University of New York at Albany Paul Kantor, et al Rutgers University Boris.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Information Retrieval and Web Search
Information Retrieval and Web Search
Information Retrieval and Web Search
Introduction to Information Retrieval
Information Retrieval and Web Search
Information Retrieval
Presentation transcript:

Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai China

Outline What is open domain question answering (ODQA) The state of arts of ODQA The future of ODQA ODQA as a grand challenge in CS/AI/IT Summary

What’s QA? Free Text Corpus question answer When did Hawaii become a state ? August 21, 1959

When did Hawaii become a state? AnswerBus Question Answering System - When did Hawaii become a... Type in your question in English, French, Spanish, German, Italian or Portuguese. Question: When did Hawaii become a state?... answer.cgi?When%2Bdid%2BHawaii%2Bbecome%2Ba%2Bstate%3F - 4k - Cached - Similar pagesAnswerBus Question Answering System - When did Hawaii become a... CachedSimilar pages uncategorized threads in About Hawaii... How did Hawaii become a state? What is the history of Hawaii??... When and why did Hawaii become a state (cause and effect); Safe to live by Mauna Loa?... q-and-a-one-category.tcl?topic=About%20Hawaii&category=uncategorized - 5k - Cached - Similar pagesuncategorized threads in About HawaiiCachedSimilar pages Is Hawaii Really a State of the Union?... Become a state, or remain a territory? Why was the option of independence not on the ballot? Did Hawaii not have the option to become an independent country in k - Cached - Similar pagesIs Hawaii Really a State of the Union?CachedSimilar pages Hawaii Flag Printout - EnchantedLearning.com... __________________________________________. 3. When did Hawaii become a state of the USA? _______________. Copyright © EnchantedLearning.com. flags/hawaii/hawaiiflag.shtml - 3k - Cached - Similar pagesHawaii Flag Printout - EnchantedLearning.comCachedSimilar pages PaleoZoo's Prehistoric Hawaii!... became extinct after rats and mongooses arrived in Hawaii.... let Nature decide when a species should become extinct. They decided to save the nene, and they did k - Cached - Similar pagesPaleoZoo's Prehistoric Hawaii!CachedSimilar pages

When did Hawaii become a state? HAWAII SUPREME COURT DROPS GAY MARRIAGE CASE || Human Rights It did not bar future cases that seek the benefits, protections and responsibilities that come... Their ads claimed that Hawaii would become the "homosexual k - 30 Jun Cached - Similar pagesHAWAII SUPREME COURT DROPS GAY MARRIAGE CASE || Human Rights...CachedSimilar pages Maui Trivia by MAUI CHEETAH... ~ Ans: Front Street in Lahaina ***** submitted by: THonings; When did hawaii become a state? ~~ Ans: 1959 ***** submitted k - Cached - Similar pagesMaui Trivia by MAUI CHEETAHCachedSimilar pages State Bird of Hawaii Unmasked as Canadian... it should be no surprise that Canada geese did it some... But in their adopted tropical habitat of Hawaii, the birds "evolved to become more independent of... news.nationalgeographic.com/news/2002/ 02/0206_020206_canadiangeese.html - 38k - Cached - Similar pagesState Bird of Hawaii Unmasked as Canadian CachedSimilar pages [PDF]BEFORE ARBITRATOR TAMOTSU TANAKA STATE OF HAWAII In the Matter of... File Format: PDF/Adobe Acrobat - View as HTML... training to insure that qualified employees become available.... did not contravene the provisions of the Collective... DATED: Honolulu, Hawaii, December 10, Similar pagesBEFORE ARBITRATOR TAMOTSU TANAKA STATE OF HAWAII In the Matter of...View as HTMLSimilar pages

Comparison to Search Engines More natural interface Natural language question vs Keywords More compact answer Exact answers vs Relevant documents

The General solution of QA Question Analysis Model Search Engine Model Answer extraction Model Query set Answer Type/Patterns Potential segments

Question Analysis Input: Question ( When did Hawaii become a state?) Output: Answer type/Patters (Date) Queries (A group of key words: Hawaii, state, became…) Methods: POS tagging Named entity tagging BMP Chunking Syntactic parsing Semantic tagging …..

Question Analysis Input: Question ( When did Hawaii become a state?) Output: Answer type : Date Patters : “Hawaii became a state in….” “In … Hawaii became a state.” …………. Queries (A group of key words): “When did Hawaii become a state” “Hawaii became a state in….” Hawaii, state, became

Search Input: Queries (“Hawaii became a state in”, i.e. groups of key words or phrases Output: Text segments (snippets) relevant to the answer such as the ones returned by Google Methods: Search Engines for passages

Answer Extraction Input: Question answer type/patterns from question analysis Snippets returned by search engines Output: Answers Methods: POS tagging Named entity tagging BMP Chunking Syntactic parsing Semantic tagging Co-reference resolution Logic Proving/Matching ………….

Answer Extraction Question: When did Hawaii become a state? Answer type: Date Patterns from question analysis: “Hawaii became a state ….” “In … Hawaii became a state.” …………. Snippets returned by search engines”: “…Hawaii became the 50th state on Aug.21,1959…” “…Hawaii joined the States in 1959……” ………………

Key techniques CL: Part-of-speech tagging NE tagging Semantic tagging BNP Chunking Reference resolution Syntactic parsing IR: Search Engine AI: Pattern Matching Logic proving Machine Learning

Key Knowledge Dictionaries WordNet HowNet FrameNet World Knowledge Encyclopedia Web

The State of The Arts: Introduction of TREC- QA Task Organized by NIST Sponsor : NIST, DARPA, and ARDA Start from 1999 Have the most participants among tasks

TREC-QA2002 participants (35) Alicante Unv. BBN, CMU-Javelin, Chinese Academy of Sciences, CL Research, Columbia Univ.-Illouz, Fudan University, IBM T.J. Watson Res. Ctr.-Ittycheriah, IBM T.J. Watson Res. Ctr.-Prager, InsightSoft-M, ITC-irst, Language Comuter Corporation, LIMSI, MIT, National Univ. of Singapore-Lee, National Univ. of Singapore-Hui, NTT Communication Science Labs, POSTECH, Syracuse University, The MITRE Corp. Tokyo Univ. of Science, Univ. of Amsterdam – Monz, Universit d ’ Angers, Univ. of Avignon, Univ. of Illinois at Urbana/Champaign, Univ. of Iowa, Univ. of Limerick, Univ. of Michigan, Univ. of Montreal, Univ. of Pisa, Univ. of Sheffield, Univ of Southern California/ISI, Univ. of Waterloo, Univ. of York

Document set The document set is the set of documents on the AQUAINT disk set. 3GB News

Evaluation 500 questions (Ex. When did Hawaii become as state?) For each question the answer is evaluated as Incorrect (W): the answer-string does not contain a correct answer or the answer is not responsive; Unsupported (U): the answer-string contains a correct answer but the document returned does not support that answer; Non-exact (X): the answer-string contains a correct answer and the document supports that answer, but the string contains more than just the answer (or is missing bits of the answer); Correct (R): the answer-string consists of exactly a correct answer and that answer is supported by the document returned. Only correct answers have scores

Score

Top 15 Groups (2002)

TREC-QA2003 participants (25) Alicante Unv. BBN, CMU-Javelin, Chinese Academy of Sciences, CL Research, Fudan University, IBM T.J. Watson Res. Ctr.-Ittycheriah, IBM T.J. Watson Res. Ctr.-Prager, ITC-irst, Language Comuter Corporation, Lexiclone Inc LIMSI, MIT, National Univ. of Singapore, NTT Communication Science Labs, New Mexico State Univ. The MITRE Corp. Univ. of Amsterdam – Monz, Univ. of Iowa, Univ. of Limerick, UPC&UdG Univ. of Pisa, Univ. of Sheffield, Univ of Southern California/ISI, Univ. of Waterloo, Univ. of Wales Bangor

TREC2004:Question Set A series of questions for each of a set of targets Number of targets: Each series will contain: –Several factoid questions –0-2 list questions –A question called “ other ”

Example question When was AmeriCorps founded? How many volunteers work for it? What activities are its volunteers involved in? Other

Question Set Targets: –Suggested by mining Microsoft and AOL web search logs The assessors created the questions before they did any searching of the document set to find answers to the questions.

The future of ODQA: A Roadmap ---Adapted from NIST Vision paper Variation of questions

The simplest questions Factual questions : What is Hawaii’s state flower? Void Questions : The answer is no longer guaranteed to be present in the text collection and the systems are expected to notify the absence of an answer. List Questions : The answer is scattered across two or more documents Context Questions : A group of relevant questions “ within a context ”

List Questions The answer is scattered across two or more documents What countries from the South America did the Pope visit and when? Answer: Argentina – 1987 [Document Source 1] Columbia – 1986 [Document Source 2] Brazil – 1982, 1991 [Document Source 3]

Context Questions A group of relevant questions “ within a context ” Context: Topic Title: Financing AMTRAK - Description: The role of the Federal Government in financing the operation of the National Railroad Transportation Corporation (AMTRAK). (Q1) Why AMTRAK cannot be considered economically viable ? (Q2) Should it be privatized ? (Q3) How much larger are the government subsidies to AMTRAK as compared to those given to air transportation ?

Definition/Template Question There are some template for this kind of questions Example: Who is XXX? The template consists of The address, phone number, Fax number, address, Website,…. The Education history The working experience The contributions ………

Question with ambiguity The answer will comprise an explanation of possible ambiguities and a justification of why the answer is right

Examples Where is the Taj Mahal? Answer: If you are interested in the Indian landmark, it is in Agra, India. If instead you want to find the location of the Casino, it is in Atlantic City, NJ, U.S.A. There are also several restaurants named Taj Mahal. A full list is rendered by the following hypertable. If you click on the location, you may find the address. The Taj Mahal Indian Cuisine, Mountain View, CA The Taj Mahal Restaurant, Dallas, TX Taj Mahal, Las Vegas, NV Taj Mahal, Springfield, VA

Examples How did Socrates die? Answer: He drunk poisoned wine. Anyone drinking or eating something that is poisoned is likely to die.

Summaries as answer More complex questions will requires the answers to be summaries of the textual information comprised in one or several documents. The summarization is going to be driven by the question from one or multiple documents, Moreover, the summary will present in a coherent manner using text generation capabilities.

Examples Context-based summary-generating questions. What is the financial situation of AMTRAK? Stand-alone summary-generating questions How safe are commercial flights? Example-based summary-generating questions What other companies are operated with Government aid?

Expert-Level Questions The questions asked by expert requires Collect sufficient structured and unstructured information for different domains. Mining domain knowledge and mastering the relationships between all activities, situations and facts within a specific domain. Reasoning by analogy, comparing and discovering new relations

Examples (Q1) What are the opinions of the Danes on the Euro? (Q2) Why so many people buy four- wheel-drive cars lately? (Q3) How likely is it that the Fed will raise the interest rates at their next meeting?

A General Approach Accept complex “ Questions ” in a form natural to the analyst Translate “ Complex Question ” into multiple queries appropriate to the various data sets to be searched Find relevant information in distributed, multimedia, multilingual, multi-agency data sources. Analyze, fuse and summarize information into a coherent “ Answer. Provide (Proposed) “ Answer ” to analyst in the form they want. Provide Multimedia Visualization and Navigation tools.

ODQA as a grand challenge What makes a good long-range research goal or a grand challenge ---Jim Gray Understandable. The goal should be simple to state Challenging. It should not be obvious how to achieve the goal Useful. If the goal is achieved, the results should be clearly useful to many people Testable. Solutions to the goal should have a simple test so that one can measure progress and one can tell when the goal is achieved Incremental. It is very desirable that the goal has intermediate milestones so that progress can be measured along the way

QA as a grand challenge A more demanding task is to take a corpus like the Internet or the Computer Science journals, or Encyclopedia Britannica, and be able to answer summarization questions about it as well as a human expert in that field ---Jim Gray Journal ACM, Jan.2003 ( J.ACM’s 50 th Anniversary)

QA as a grand challenge Read a Chapter in a Book and Answer the Questions at the End of the Chapter. Reading and understanding books is a quintessentially human activity. It is the process by which much knowledge transfer occurs from generation to generation. -- Ai-Raj Reddy Journal ACM, Jan.2003

QA as a grand challenge Build a large knowledge base by reading text, reducing knowledge engineering effort by one order of magnitude The intent here is to “educate” a knowledge base in the same way that we receive most of our education --Edward A. Feigenbaum Journal ACM, Jan.2003

QA as a grand challenge Because questions can be devises to query any aspect of text comprehension, the ability to answer questions is the strongest possible demonstration of understanding. ---Wendy Lehnert So ODQA is AI complete in some sense

Conclusion Open Domain Question Answering is a grand challenge in CS/AI/IT It is Understandable, Challenging, Useful, Testable, and Incremental.

Thanks