Presentation is loading. Please wait.

Presentation is loading. Please wait.

Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Similar presentations


Presentation on theme: "Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China."— Presentation transcript:

1 Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China

2 Outline What is open domain question answering (ODQA) The state of arts of ODQA The future of ODQA ODQA as a grand challenge in CS/AI/IT Summary

3 What’s QA? Free Text Corpus question answer When did Hawaii become a state ? August 21, 1959

4 When did Hawaii become a state? AnswerBus Question Answering System - When did Hawaii become a... Type in your question in English, French, Spanish, German, Italian or Portuguese. Question: When did Hawaii become a state?... www.answerbus.com/cgi-bin/ answer.cgi?When%2Bdid%2BHawaii%2Bbecome%2Ba%2Bstate%3F - 4k - Cached - Similar pagesAnswerBus Question Answering System - When did Hawaii become a... CachedSimilar pages uncategorized threads in About Hawaii... How did Hawaii become a state? What is the history of Hawaii??... When and why did Hawaii become a state (cause and effect); Safe to live by Mauna Loa?... www.greenspun.com/bboard/ q-and-a-one-category.tcl?topic=About%20Hawaii&category=uncategorized - 5k - Cached - Similar pagesuncategorized threads in About HawaiiCachedSimilar pages Is Hawaii Really a State of the Union?... Become a state, or remain a territory? Why was the option of independence not on the ballot? Did Hawaii not have the option to become an independent country in... www.hawaii-nation.org/statehood.html - 14k - Cached - Similar pagesIs Hawaii Really a State of the Union?CachedSimilar pages Hawaii Flag Printout - EnchantedLearning.com... __________________________________________. 3. When did Hawaii become a state of the USA? _______________. Copyright © 2000-2003 EnchantedLearning.com. www.enchantedlearning.com/usa/ flags/hawaii/hawaiiflag.shtml - 3k - Cached - Similar pagesHawaii Flag Printout - EnchantedLearning.comCachedSimilar pages PaleoZoo's Prehistoric Hawaii!... became extinct after rats and mongooses arrived in Hawaii.... let Nature decide when a species should become extinct. They decided to save the nene, and they did.... www.geobop.com/paleozoo/World/NA/US/HI/ - 42k - Cached - Similar pagesPaleoZoo's Prehistoric Hawaii!CachedSimilar pages

5 When did Hawaii become a state? HAWAII SUPREME COURT DROPS GAY MARRIAGE CASE || Human Rights...... It did not bar future cases that seek the benefits, protections and responsibilities that come... Their ads claimed that Hawaii would become the "homosexual... www.hrc.org/newsreleases/1999/991210.asp - 16k - 30 Jun 2003 - Cached - Similar pagesHAWAII SUPREME COURT DROPS GAY MARRIAGE CASE || Human Rights...CachedSimilar pages Maui Trivia by MAUI CHEETAH... ~ Ans: Front Street in Lahaina ***** submitted by: THonings; When did hawaii become a state? ~~ Ans: 1959 ***** submitted... www.mauigateway.com/~rw/trivia1.htm - 12k - Cached - Similar pagesMaui Trivia by MAUI CHEETAHCachedSimilar pages State Bird of Hawaii Unmasked as Canadian... it should be no surprise that Canada geese did it some... But in their adopted tropical habitat of Hawaii, the birds "evolved to become more independent of... news.nationalgeographic.com/news/2002/ 02/0206_020206_canadiangeese.html - 38k - Cached - Similar pagesState Bird of Hawaii Unmasked as Canadian CachedSimilar pages [PDF]BEFORE ARBITRATOR TAMOTSU TANAKA STATE OF HAWAII In the Matter of... File Format: PDF/Adobe Acrobat - View as HTML... training to insure that qualified employees become available.... did not contravene the provisions of the Collective... DATED: Honolulu, Hawaii, December 10, 1998.... www.state.hi.us/hrd/121098.pdf - Similar pagesBEFORE ARBITRATOR TAMOTSU TANAKA STATE OF HAWAII In the Matter of...View as HTMLSimilar pages

6

7 Comparison to Search Engines More natural interface Natural language question vs Keywords More compact answer Exact answers vs Relevant documents

8 The General solution of QA Question Analysis Model Search Engine Model Answer extraction Model Query set Answer Type/Patterns Potential segments

9 Question Analysis Input: Question ( When did Hawaii become a state?) Output: Answer type/Patters (Date) Queries (A group of key words: Hawaii, state, became…) Methods: POS tagging Named entity tagging BMP Chunking Syntactic parsing Semantic tagging …..

10 Question Analysis Input: Question ( When did Hawaii become a state?) Output: Answer type : Date Patters : “Hawaii became a state in….” “In … Hawaii became a state.” …………. Queries (A group of key words): “When did Hawaii become a state” “Hawaii became a state in….” Hawaii, state, became

11 Search Input: Queries (“Hawaii became a state in”, i.e. groups of key words or phrases Output: Text segments (snippets) relevant to the answer such as the ones returned by Google Methods: Search Engines for passages

12 Answer Extraction Input: Question answer type/patterns from question analysis Snippets returned by search engines Output: Answers Methods: POS tagging Named entity tagging BMP Chunking Syntactic parsing Semantic tagging Co-reference resolution Logic Proving/Matching ………….

13 Answer Extraction Question: When did Hawaii become a state? Answer type: Date Patterns from question analysis: “Hawaii became a state ….” “In … Hawaii became a state.” …………. Snippets returned by search engines”: “…Hawaii became the 50th state on Aug.21,1959…” “…Hawaii joined the States in 1959……” ………………

14 Key techniques CL: Part-of-speech tagging NE tagging Semantic tagging BNP Chunking Reference resolution Syntactic parsing IR: Search Engine AI: Pattern Matching Logic proving Machine Learning

15 Key Knowledge Dictionaries WordNet HowNet FrameNet World Knowledge Encyclopedia Web

16 The State of The Arts: Introduction of TREC- QA Task http://trec.nist.gov Organized by NIST Sponsor : NIST, DARPA, and ARDA Start from 1999 Have the most participants among tasks

17 TREC-QA2002 participants (35) Alicante Unv. BBN, CMU-Javelin, Chinese Academy of Sciences, CL Research, Columbia Univ.-Illouz, Fudan University, IBM T.J. Watson Res. Ctr.-Ittycheriah, IBM T.J. Watson Res. Ctr.-Prager, InsightSoft-M, ITC-irst, Language Comuter Corporation, LIMSI, MIT, National Univ. of Singapore-Lee, National Univ. of Singapore-Hui, NTT Communication Science Labs, POSTECH, Syracuse University, The MITRE Corp. Tokyo Univ. of Science, Univ. of Amsterdam – Monz, Universit d ’ Angers, Univ. of Avignon, Univ. of Illinois at Urbana/Champaign, Univ. of Iowa, Univ. of Limerick, Univ. of Michigan, Univ. of Montreal, Univ. of Pisa, Univ. of Sheffield, Univ of Southern California/ISI, Univ. of Waterloo, Univ. of York

18 Document set The document set is the set of documents on the AQUAINT disk set. 3GB News

19 Evaluation 500 questions (Ex. When did Hawaii become as state?) For each question the answer is evaluated as Incorrect (W): the answer-string does not contain a correct answer or the answer is not responsive; Unsupported (U): the answer-string contains a correct answer but the document returned does not support that answer; Non-exact (X): the answer-string contains a correct answer and the document supports that answer, but the string contains more than just the answer (or is missing bits of the answer); Correct (R): the answer-string consists of exactly a correct answer and that answer is supported by the document returned. Only correct answers have scores

20 Score

21 Top 15 Groups (2002)

22 TREC-QA2003 participants (25) Alicante Unv. BBN, CMU-Javelin, Chinese Academy of Sciences, CL Research, Fudan University, IBM T.J. Watson Res. Ctr.-Ittycheriah, IBM T.J. Watson Res. Ctr.-Prager, ITC-irst, Language Comuter Corporation, Lexiclone Inc LIMSI, MIT, National Univ. of Singapore, NTT Communication Science Labs, New Mexico State Univ. The MITRE Corp. Univ. of Amsterdam – Monz, Univ. of Iowa, Univ. of Limerick, UPC&UdG Univ. of Pisa, Univ. of Sheffield, Univ of Southern California/ISI, Univ. of Waterloo, Univ. of Wales Bangor

23

24 TREC2004:Question Set A series of questions for each of a set of targets Number of targets: 50-100 Each series will contain: –Several factoid questions –0-2 list questions –A question called “ other ”

25 Example question When was AmeriCorps founded? How many volunteers work for it? What activities are its volunteers involved in? Other

26 Question Set Targets: –Suggested by mining Microsoft and AOL web search logs The assessors created the questions before they did any searching of the document set to find answers to the questions.

27 The future of ODQA: A Roadmap ---Adapted from NIST Vision paper Variation of questions

28

29 The simplest questions Factual questions : What is Hawaii’s state flower? Void Questions : The answer is no longer guaranteed to be present in the text collection and the systems are expected to notify the absence of an answer. List Questions : The answer is scattered across two or more documents Context Questions : A group of relevant questions “ within a context ”

30 List Questions The answer is scattered across two or more documents What countries from the South America did the Pope visit and when? Answer: Argentina – 1987 [Document Source 1] Columbia – 1986 [Document Source 2] Brazil – 1982, 1991 [Document Source 3]

31 Context Questions A group of relevant questions “ within a context ” Context: Topic 168 - Title: Financing AMTRAK - Description: The role of the Federal Government in financing the operation of the National Railroad Transportation Corporation (AMTRAK). (Q1) Why AMTRAK cannot be considered economically viable ? (Q2) Should it be privatized ? (Q3) How much larger are the government subsidies to AMTRAK as compared to those given to air transportation ?

32 Definition/Template Question There are some template for this kind of questions Example: Who is XXX? The template consists of The address, phone number, Fax number, Email address, Website,…. The Education history The working experience The contributions ………

33 Question with ambiguity The answer will comprise an explanation of possible ambiguities and a justification of why the answer is right

34 Examples Where is the Taj Mahal? Answer: If you are interested in the Indian landmark, it is in Agra, India. If instead you want to find the location of the Casino, it is in Atlantic City, NJ, U.S.A. There are also several restaurants named Taj Mahal. A full list is rendered by the following hypertable. If you click on the location, you may find the address. The Taj Mahal Indian Cuisine, Mountain View, CA The Taj Mahal Restaurant, Dallas, TX Taj Mahal, Las Vegas, NV Taj Mahal, Springfield, VA

35 Examples How did Socrates die? Answer: He drunk poisoned wine. Anyone drinking or eating something that is poisoned is likely to die.

36 Summaries as answer More complex questions will requires the answers to be summaries of the textual information comprised in one or several documents. The summarization is going to be driven by the question from one or multiple documents, Moreover, the summary will present in a coherent manner using text generation capabilities.

37 Examples Context-based summary-generating questions. What is the financial situation of AMTRAK? Stand-alone summary-generating questions How safe are commercial flights? Example-based summary-generating questions What other companies are operated with Government aid?

38 Expert-Level Questions The questions asked by expert requires Collect sufficient structured and unstructured information for different domains. Mining domain knowledge and mastering the relationships between all activities, situations and facts within a specific domain. Reasoning by analogy, comparing and discovering new relations

39 Examples (Q1) What are the opinions of the Danes on the Euro? (Q2) Why so many people buy four- wheel-drive cars lately? (Q3) How likely is it that the Fed will raise the interest rates at their next meeting?

40 A General Approach Accept complex “ Questions ” in a form natural to the analyst Translate “ Complex Question ” into multiple queries appropriate to the various data sets to be searched Find relevant information in distributed, multimedia, multilingual, multi-agency data sources. Analyze, fuse and summarize information into a coherent “ Answer. Provide (Proposed) “ Answer ” to analyst in the form they want. Provide Multimedia Visualization and Navigation tools.

41 ODQA as a grand challenge What makes a good long-range research goal or a grand challenge ---Jim Gray Understandable. The goal should be simple to state Challenging. It should not be obvious how to achieve the goal Useful. If the goal is achieved, the results should be clearly useful to many people Testable. Solutions to the goal should have a simple test so that one can measure progress and one can tell when the goal is achieved Incremental. It is very desirable that the goal has intermediate milestones so that progress can be measured along the way

42 QA as a grand challenge A more demanding task is to take a corpus like the Internet or the Computer Science journals, or Encyclopedia Britannica, and be able to answer summarization questions about it as well as a human expert in that field ---Jim Gray Journal ACM, Jan.2003 ( J.ACM’s 50 th Anniversary)

43 QA as a grand challenge Read a Chapter in a Book and Answer the Questions at the End of the Chapter. Reading and understanding books is a quintessentially human activity. It is the process by which much knowledge transfer occurs from generation to generation. -- Ai-Raj Reddy Journal ACM, Jan.2003

44 QA as a grand challenge Build a large knowledge base by reading text, reducing knowledge engineering effort by one order of magnitude The intent here is to “educate” a knowledge base in the same way that we receive most of our education --Edward A. Feigenbaum Journal ACM, Jan.2003

45 QA as a grand challenge Because questions can be devises to query any aspect of text comprehension, the ability to answer questions is the strongest possible demonstration of understanding. ---Wendy Lehnert So ODQA is AI complete in some sense

46 Conclusion Open Domain Question Answering is a grand challenge in CS/AI/IT It is Understandable, Challenging, Useful, Testable, and Incremental.

47 Thanks


Download ppt "Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China."

Similar presentations


Ads by Google