1 I256 Applied Natural Language Processing Fall 2009 Question answering Barbara Rosario.

1 I256 Applied Natural Language Processing Fall 2009 Question answering Barbara Rosario

2 QA: Outline Introduction Factoid QA Three stages of a typical QA system –Question processing –Passage retrieval –Answer processing Evaluation of factoids answers Complex Questions Acknowledgments –Speech and language processing, Jurafsky and Martin (chapter 23) –Some slides adapted from Manning, Harabagiu, Kusmerick, ISI

3 The Problem of Question Answering (QA) What is the nationality of Pope John Paul II? … stabilize the country with its help, the Catholic hierarchy stoutly held out for pluralism, in large part at the urging of Polish-born Pope John Paul II. When the Pope emphatically defended the Solidarity trade union during a 1987 tour of the… Natural language question, not keyword queries Short text fragment, not URL list Answer: Polish

4 People want to ask questions… Examples from AltaVista query log who invented surf music? how to make stink bombs where are the snowdens of yesteryear? which english translation of the bible is used in official catholic liturgies? how to do clayart how to copy psx how tall is the sears tower? Examples from Excite query log how can i find someone in texas where can i find information on puritan religion? what are the 7 wonders of the world how can i eliminate stress What vacuum cleaner does Consumers Guide recommend

5 A spectrum of question types What is the typical height of a giraffe? Where is Apple based? Factoids (Question/Answer) Browse and Build What are some good ideas for landscaping my client’s yard? Complex questions (QA, Text Data Mining) What are some promising untried treatments for Raynaud’s disease?

6 Factoid QA Factoid QA if the information required is a simple fact –Examples: Where is the Louvre Museum located? What currency is used in China? What is the official language of Algeria? Fundamental problem: the gap between the way questions are posed and the way answers are expressed in the text User questions: –What company sells the most greeting cards? Potential document answer –Hallmark remains the largest market of greeting cards Need to process both questions and answers and them “match” them

8 Is this good? What is the problem?

9 Typical Structure of a QA-System Question 1) Query Processing Query formulation Query classification 2) Passage retrieval 3) Answer processing Answer Three stages: 1.Question processing 2.Passage retrieval 3.Answer processing Corpus or Web IR Answer Type Query

10 1) Question processing Goal: given a natural language question, extract: 1.Keyword query suitable as input to a IR system –Query formulation 2.Answer type (specification of the kind of entity that would constitute a reasonable answer to the question) –Question classification

11 1)Question processing: Query formulation Extract lexical terms (keywords) from the question –possibly expanded with lexical/semantic variations (especially for smaller set of documents)

12 Lexical Terms Extraction Questions approximated by sets of unrelated words (lexical terms) Similar to bag-of-word IR models Question (from TREC QA track)Lexical terms Q002: What was the monetary value of the Nobel Peace Prize in 1989? monetary, value, Nobel, Peace, Prize Q003: What does the Peugeot company manufacture? Peugeot, company, manufacture Q004: How much did Mercury spend on advertising in 1993? Mercury, spend, advertising, 1993 Q005: What is the name of the managing director of Apricot Computer? name, managing, director, Apricot, Computer

13 Keyword Selection Examples What researcher discovered the vaccine against Hepatitis-B? –Hepatitis-B, vaccine, discover, researcher What is the name of the French oceanographer who owned Calypso? –Calypso, French, own, oceanographer What U.S. government agency registers trademarks? –U.S., government, trademarks, register, agency What is the capital of Kosovo? –Kosovo, capital

14 1) Question processing: Query reformulation Keyword Selection Query reformulation Apply set of query reformulation rules to the query –To make it look like a substring of possible declarative answers –“when was the laser invented”  “laser was invented” –Send reformulation to search engine –Rules examples: (Lin 07) wh-word did A verb B  A verb-ed B Where is A  A is located in

15 1) Question processing: Query Classification Classify the question by its expected answer type Important both at the retrieval phase and answer presentation phase –“who is Zhou Enlai” may use biographic- specific template

16 Query Classification Question Stems and Answer Types QuestionQuestion stemAnswer type Q555: What was the name of Titanic’s captain? WhatPerson Q654: What U.S. Government agency registers trademarks? WhatOrganization Q162: What is the capital of Kosovo?WhatCity Q661: How much does one ton of cement cost? How muchQuantity Other question stems: Who, Which, Name, How hot... Other answer types: Country, Number, Product...

17 Detecting the Expected Answer Type In some cases, the question stem is sufficient to indicate the answer type (AT) –Why  REASON –When  DATE In many cases, the question stem is ambiguous –Examples What was the name of Titanic’s captain ? What U.S. Government agency registers trademarks? What is the capital of Kosovo? –Solution: select additional question concepts (AT words) that help disambiguate the expected answer type Captain, agency, capital/city

18 Answer Type Taxonomy Rich set of AT, often hierarchical Hierarchical AT taxonomies can be built by hand or dynamically from WordNet

19 Answer Type Taxonomy Encodes 8707 English concepts to help recognize expected answer type Mapping to parts of Wordnet done by hand

20 Answer Type Detection Algorithms AT detection accuracy is high on easy AT such as PERSON, LOCATION, TIME Detecting REASON and DESCRIPTION questions can be much harder The derivation of the answer type is the main source of unrecoverable errors in the QA system Hand-written rules –Webclopedia QA typology contains 276 rules with 180 answer types Supervised machine learning (classification) –Typical features include the words, POS, named entities, headwords (words that give extra information: headwords of the first NP after the wh-word; “which is the state flower of California?”) Using WordNet and AT taxonomy

21 Answer Type Detection Algorithm with AT taxonomy Map the AT word in a previously built AT hierarchy –The AT hierarchy is based on WordNet, with some concepts associated with semantic categories, e.g. “writer”  PERSON. Select the AT(s) from the first hypernym(s) associated with a semantic category.

22 researcher oceanographer chemist scientist, man of science American islander, island-dweller westerner inhabitant, dweller, denizen actor actress dancer performer, performing artist ballet dancer tragedian ERSON P What researcher discovered Hepatitis-B vaccine What researcher discovered the vaccine against Hepatitis-B? What is the name of the French oceanographer who owned Calypso? PERSON What oceanographer owned Calypso name French PERSON Answer Type Detection Algorithm with AT taxonomy

23 QA stages 1.Question processing Query formulation Question classification 2.Passage retrieval 3.Answer processing

24 2) Passage retrieval IR system returns set of documents Passage can be sentence, paragraph section Passage retrieval: extract set of potential answer passages from retrieved documents by: 1.Filtering out passages that don’t contain potential answers to the question Named entity recognition or answer type classification 2.Ranking the rest accordingly on how likely they are to contain the answer Hand-built rules Machine learning

25 2) Passage retrieval (ranking) Most common features for passage ranking –Number of named entities of the right type in passage –Number of question keywords in passage –The longest exact sequence of question keywords that occurs in the passage –The rank of the document from which the passage was extracted –The proximity of the keywords from the original query from each other (to prefer smaller spans that include more keywords) –The N-gram overlap between the passage and the question (to prefer passages with higher N-gram overlap with question

26 2) Passage retrieval For QA from the Web we may skip the step of passage retrieval, by relying on the snippets produced by the web searches –Ex: when was movable type metal printing invented in Korea?

27 QA stages 1.Question processing Query formulation Question classification 2.Passage retrieval Filter out passages Rank them 3.Answer processing

28 3) Answer processing Extract a specific answer from the passage Two main classes of algorithms 1.Answer-type pattern extraction 2.N-gram tiling

29 3) Answer processing: pattern extraction Use information about the expected AT together with regular expression –If AT is HUMAN, extract named entities HUMAN from passage For some AT (DEFINITION, for example) don’t have a particular named entity type –Regex patterns (by hand or learnt automatically) PatternQuestionAnswer such as What is autism?“, developmental disorders such as autism”

30 Answer processing: N-gram tiling AskMSR System Architecture 1 2 3 4 5

31 Step 3: Gathering N-Grams Enumerate all N-grams (N=1,2,3) in all retrieved snippets Weight of an n-gram: occurrence count, each weighted by “reliability” (weight) of rewrite rule that fetched the document –Example: “Who created the character of Scrooge?” Dickens 117 Christmas Carol 78 Charles Dickens 75 Disney 72 Carl Banks 54 A Christmas 41 Christmas Carol 45 Uncle 31

32 Step 4: Filtering N-Grams N-gram are scored by how well they match the predicted answer type –Boost score of n-grams that match regexp –Lower score of n-grams that don’t match regexp

33 Step 5: Tiling the Answers Dickens Charles Dickens Mr Charles Scores 20 15 10 merged, discard old n-grams Mr Charles Dickens Score 45 Concatenate overlaping N-grams fragment into longer answers

34 QA stages 1.Question processing Query formulation Question classification 2.Passage retrieval Filter out passages Rank them 3.Answer processing Answer-type pattern extraction N-gram tiling Evaluation of factoids answers

35 Evaluation of factoids answers Variety of tchniques have been proposed Most influential evaluation framework: TRAC (Text Retrieval Conference) QA track –http://trec.nist.gov/

36 Question Answering at TREC Question answering competition at TREC consists of answering a set of 500 fact-based questions, e.g., –“When was Mozart born?”. Has really pushed the field forward. The document set –Newswire textual documents from LA Times, San Jose Mercury News, Wall Street Journal, NY Times etcetera: over 1M documents now. –Well-formed lexically, syntactically and semantically (were reviewed by professional editors). The questions –Hundreds of new questions every year, the total is ~2400 Task –Extract only one exact answer. –Several other sub-tasks added later: definition, list, biography.

37 Sample TREC questions 1. Who is the author of the book, "The Iron Lady: A Biography of Margaret Thatcher"? 2. What was the monetary value of the Nobel Peace Prize in 1989? 3. What does the Peugeot company manufacture? 4. How much did Mercury spend on advertising in 1993? 5. What is the name of the managing director of Apricot Computer? 6. Why did David Koresh ask the FBI for a word processor? 7. What is the name of the rare neurological disease with symptoms such as: involuntary movements (tics), swearing, and incoherent vocalizations (grunts, shouts, etc.)?

38 TREC Scoring Systems return 5 ranked answer snippets to each question. –Mean Reciprocal Rank Scoring (MRR): Each question assigned the reciprocal rank of the first correct answer. If correct answer at position k, the score is 1/k. 1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6+ position –Score of a system is the average of the score for each question –System returning ranked answers for a test set of N question:

39 Top Performing Systems In 2003, the best performing systems at TREC can answer approximately 60-70% of the questions Approaches and successes have varied a fair deal –Knowledge-rich approaches, using a vast array of NLP techniques stole the show in 2000-2003 Notably Harabagiu, Moldovan et al. ( SMU/UTD/LCC ) Statistical systems starting to catch up –AskMSR system stressed how much could be achieved by very simple methods with enough text –People are experimenting with machine learning methods

40 QA stages 1.Question processing Query formulation Question classification 2.Passage retrieval Filter out passages Rank them 3.Answer processing Answer-type pattern extraction N-gram tiling Evaluation of factoids answers Complex Questions

41 Focused Summarization and QA Most interesting/important questions are not factoids –In children with acute febrile illness, what is the efficacy of single-medication therapy with acetaminophen or ibuprofen in reducing fever? –Where have poacher endangered wildlife, what wildlife has been endangered and what steps have been taken to prevent poaching? Factoids may be found in a single document, more complex questions may require analysis and synthesis from multiple sources –Summarization techniques: (query) focused summarization –Information extraction techniques

42 Structure of a complex QA system Corpus or Web Question Query Answer Type Query Processing Query formulation Query classification Data-Driven analysis IR Predicate identification Three stages: 1.Question processing 2.Predicate identification 3.Data driven analysis 4.Definition creation Definition creation Answer What are some promising untried treatments for Raynaud’s disease?

43 Stages of complex QA system Predicate identification –Information extraction (identification of the appropriate semantic entities (DISEASE) Data driven analysis –Summarization, co-reference, inference, avoid redundancy… All the difficult stuff! Definition creation –If domain specific, can have templates for information ordering: –E.g., for biography questions, may use a template such as: – is. She was born in. She..

1 I256 Applied Natural Language Processing Fall 2009 Question answering Barbara Rosario.

Similar presentations

Presentation on theme: "1 I256 Applied Natural Language Processing Fall 2009 Question answering Barbara Rosario."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 I256 Applied Natural Language Processing Fall 2009 Question answering Barbara Rosario.

Similar presentations

Presentation on theme: "1 I256 Applied Natural Language Processing Fall 2009 Question answering Barbara Rosario."— Presentation transcript:

Similar presentations

About project

Feedback