Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Surface Text Patterns for a Question Answering System Deepak Ravichandran Eduard Hovy Information Sciences Institute University of Southern California.

Similar presentations


Presentation on theme: "Learning Surface Text Patterns for a Question Answering System Deepak Ravichandran Eduard Hovy Information Sciences Institute University of Southern California."— Presentation transcript:

1 Learning Surface Text Patterns for a Question Answering System Deepak Ravichandran Eduard Hovy Information Sciences Institute University of Southern California

2 From Proceedings of the ACL Conference, 2002

3 Goal Explore power of surface text patterns for open-domain QA systems

4 Why This Paper Fall 2001 NLP project - QA system

5 Winning Team Matt Myers & Henry Longmore –"If we were asked to design another question answering system, we would keep the same basic system as a foundation. We would then use more patterns and variations of patterns in the NE recognizer. We would use Machine Learning techniques, particularly for learning patterns for the NE recognizer."

6 Meanwhile, back at the batcave... Automatic learning of surface text patterns for open-domain question answering

7 Recent Open Domain Systems External knowledge, tools –Named Entity taggers –WordNet –parsers –hand-tagged corpora –ontology lists

8 Recent O-D Systems (cont.) Recent TREC-10 evaluation –winning system used just 1 resource –extensive list of surface patterns –surprised many

9 Basic Idea Investigate potential of surface patterns –Learn patterns –Measure accuracy

10 Characteristic Phrases "When was born” –Typical answers "Mozart was born in 1756.” "Gandhi (1869-1948)...” –Suggests phrases like " was born in ” " ( -” –as Regular Expressions can help locate correct answer

11 Auto-learn Patterns from Web Tagged corpus using AltaVista Hand-crafted examples of each question type Bootstrapping to build large tagged corpus as in Information Extraction (Riloff, 96) Abundance of data on web - reliable statistical estimates

12 The System Assume sentence is a simple sequence of words Search for repeated word orderings Evidence for useful answer phrases

13 System (cont.) Suffix trees to extract substrings of optimal length Suffix trees from Computational Biology (Gusfield, 97) Used to detect DNA sequences Linear time on size of corpus Don't restrict length of substrings

14 Pattern Learning Algorithm Select example for question type –BIRTHYEAR questions select "Mozart 1756” "Mozart" is question term "1756" is answer term Submit Q & A terms to AltaVista Require both terms to be present

15 Pattern Learning (cont.) Download top 1000 documents returned Apply sentence breaker to documents Keep only those sentences with both terms present

16 Pattern Learning (cont.) Terms can be present in various forms –e.g. Mozart as: Wolfgang Amadeus Mozart Mozart, Wolfgang Amadeus Amadeus Mozart Mozart

17 Pattern Learning (cont.) Specify ways in which Q term and A term can be specified in text Easy to do for BIRTHDATE Not so for Q types like DEFINITION –Many acceptable answers, all answers need to be used to ensure high confidence in precision

18 Pattern Learning (cont.) Process (tokenize, smooth whitespace, remove tags, etc.) –simplify input for egrep (or other regular expression tool) Pass sentence through suffix tree constructor –finds substrings (and counts) of all lengths

19 Pattern Learning (cont.) Example: “The great composer Mozart (1756-1791) achieved fame at a young age” “Mozart (1756-1791) was a genius” “The whole world would always be indebted to the great music of Mozart (1756-1791)” –Longest matching substring for all 3 sentences is "Mozart (1756-1791)” –Suffix tree would extract "Mozart (1756-1791)" as an output, with score of 3

20 Pattern Learning (cont.) Filter phrases in suffix tree Keep phrases containing Q & A terms Replace question term with Replace answer term with

21 Pattern Learning (cont.) Repeat with different examples of same question type –“Gandhi 1869”, “Newton 1642”, etc. Some patterns learned for BIRTHDATE –a. born in, –b. was born on, –c. ( - –d. ( - )

22 Pattern Learning (last one!) Strings partly overlapping (c & d) saved separately –Separate counts of occurrence frequencies –Can distinguish (in this case) between pattern for person still living (d) and more general pattern (c)

23 Calculate Precision Submit query to AltaVista using only Q term ("Mozart") Download top 1000 returned documents Segment into sentences as in pattern learning algorithm Keep sentences containing Q term

24 Calculate Precision (cont.) For each pattern learned, check presence of pattern in sentence –pattern with tag matched by any word –pattern with tag matched by correct A term Mozart was born in Mozart was born in 1756

25 Calculate Precision (cont.) Calculate precision of each pattern P = Ca/Co –Ca = total # of patterns w/answer term present –Co = total # of patterns w/answer term replaced by any word Keep only patterns matching sufficient # of examples (e.g. >5)

26 Calculate Precision (cont.) Obtain table of Regular Expression patterns 1 table per question type –Precision of pattern –precision is probability pattern containing answer –principle of maximum likelihood estimation

27 Calculate Precision (cont.) BIRTHDATE table: 1.0 ( - ) 0.85 was born on, 0.6 was born in 0.59 was born 0.53 was born 0.50- ( 0.36 ( -

28 Calculate Precision (cont.) Good range of patterns obtained with as few as 10 examples Rather long list difficult to come up with manually Largest number of examples the system required to get a good range of patterns?

29 Calculate Precision (cont.) Precision of patterns learned from one QA- pair calculated for other examples of same question type Helps eliminate dubious patterns –Contents of two or more sites are the same –Same document appears in search engine output for learning & precision stages

30 Finding Answers To new questions! Use existing QA system (Hovy et al., 2002b;2001) Determine type of new question Identify Question term

31 Finding Answers (cont.) Create query from Q term & do IR –use answer document corpus such as TREC-10 or web search Segment returned documents into sentences & process as before Replace Q term by Q tag –e.g. in case of BIRTHYEAR type

32 Finding Answers (cont.) Using pattern table developed for Q type, search for presence of each pattern Select words matching as potential answer Sort answers by pattern's precision scores Discard duplicate answers (string compare) Return top 5

33 Experiments 6 different Q types –from Webclopedia QA Typology (Hovy et al., 2002a) BIRTHDATE LOCATION INVENTOR DISCOVERER DEFINITION WHY-FAMOUS

34 Experiments (cont.) (BIRTHYEAR - previously shown) INVENTOR 1.0 invents 1.0the was invented by 1.0 invented the in – all have precision of 1.0

35 Experiments (cont.) DISCOVERER 1.0when discovered 1.0 's discovery of 0.9 was discovered by in DEFINITION 1.0 and related 1.0form of, 0.94as, and

36 Experiments (cont.) WHY-FAMOUS 1.0 called 1.0laureate 0.71 is the of LOCATION 1.0 's 1.0regional : : 0.92near in

37 Experiments (cont.) For each Q type, extract questions from TREC-10 set Run through testing phase (precision) Two sets of experiments

38 Experiments (cont.) Set one –TREC corpus is input –IR done by IR component of their QA system (Lin, 2002) Set two –Web is input –IR performed by AltaVista

39 Results Measured by Mean Reciprocal Rank (?) TREC Question type# of Q'sMRR BIRTHYEAR80.48 INVENTOR60.17 DISCOVERER40.13 DEFINITION1020.34 WHY-FAMOUS30.33 LOCATION160.75

40 Results (cont.) Web Q type# of Q’sMRR BIRTHYEAR80.69 INVENTOR60.58 DISCOVERER40.88 DEFINITION1020.39 WHY-FAMOUS30.00 LOCATION160.86

41 Results (cont.) System performs better on web data than on TREC corpus Abundant web data makes it easier for system to locate answers with high precision scores TREC corpus does not have enough candidate answers with high precision score –must settle for answers from low precision patterns WHY-FAMOUS exception - may be due to small # of test Q's

42 Shortcomings & Extensions Need for POS &/or semantic types "Where are the Rocky Mountains?” "Denver's new airport, topped with white fiberglass cones in imitation of the Rocky Mountains in the background, continues to lie empty” in NE tagger &/or ontology could enable system to determine "background" is not a location

43 Shortcomings... (cont.) DEFINITION Q's - match term too general, though correct technically "What is nepotism?”, "...in the form of widespread bureaucratic abuses: graft, nepotism...” "What is sonar?” and related "...while its sonar and related underseas systems are built...”

44 Shortcomings... (cont.) Long distance dependencies "Where is London?” "London, which has one of the most busiest airports in the world, lies on the banks of the river Thames” would require pattern like:, ( )*, lies on –Abundance & variety of Web data helps system to find an instance of patterns w/o losing answers to long distance dependencies

45 Shortcomings... (cont.) More info in patterns regarding length of expected answer phrase –Searches in range of 50 bytes of answer phrase to capture pattern –fails under some conditions "When was Lyndon B. Johnson born?” "...lost to democratic Sen. Lyndon B. Johnson, who ran for both re-election and the vice presidency” -

46 Shortcomings... (cont.) Lacks info that in this case should be exactly replaced by 1 word Could extend system to search for answer in range of 1-2 chunks –basic English phrases, NP, VP, PP, etc.

47 Shortcomings... (cont.) System doesn't work for Q types requiring multiple words from question to be in answer "In which county does the city of Long Beach lie?” "Long Beach is situated in Los Angeles County” required pattern: situated in

48 Shortcomings... (cont.) Performance of system depends greatly on having only 1 anchor word Multiple anchor points –would help eliminate candidate answers –require all anchor words be present in candidate answer sentence

49 Shortcomings... (cont.) Does not use case "What is micron?” "...a spokesman for Micron, a maker of semiconductors, said SIMMs are..." If Micron had been capitalized in question, would be a perfect answer

50 Shortcomings... (cont.) Canonicalization of words BIRTHDATE for Gandhi: 1869; Oct. 2, 1869; 2nd October 1869; October 2 1869; 02 October 1869; etc. –Use date tagger to cluster all variations and tag with same term –Extend idea to smooth out variations in Q term for names: Gandhi, Mahatma Gandhi, Mohandas Karamchand Gandhi, etc.

51 Conclusion Web results easily outperform TREC results Suggests need to integrate outputs from Web & TREC Word count to help eliminate unlikely answers + BIRTHDATE, LOCATION ? DEFINITION

52 Conclusion (cont.) But what about DEFINITION? 102 Q's in TREC Corpus and in Web Most Q's of all types MRR-TREC == 0.34 MRR-Web == 0.39 All other Q's have # < 20, most < 10 If enough Q's are asked, will difference in performance on Web data vs. TREC data diminish?

53 Conclusion (cont.) Simplicity - "perfect" for multilingual system QA –Low resource requirement - no NE taggers, no parsers, no ontologies, etc. –No adaptation of these to new language required –Need to create manual training terms & use appropriate web search engine

54 Regular Expressions from ask_iggy “place called\\s+($cap_pattern+)”“home called\\s+($cap_pattern+)” “at\\s((the)?\\s+($cap_pattern+))” “to\\s+($cap_pattern+)” “place\\s+in\\s+($cap_pattern+)called\\s+($cap_pattern+)” “in\\s+($cap_pattern+)”“up\\s+($cap_pattern+)” “left\\s+($cap_pattern+)” “(($cap_pattern+)[Ii]slands)” “(northern|southern|eastern|western)\\s+($cap_pattern+)” “from\\s+($cap_pattern+)” “far\\s+as\\s+($cap_pattern+)” “place\\s+in\\s+($cap_pattern+)”“home\\s+town” “city\\s+of\\s+($cap_pattern+)” “middle\\s+of\\s+((the)?\\s+($cap_pattern+))” “(($cap_pattern+)[Ii]slands\\s+of\\s+($cap_pattern+))” “place{1,1}d?\\s+near\\s+($cap_pattern+)” “((above|over)\\s+($cap_pattern+))”


Download ppt "Learning Surface Text Patterns for a Question Answering System Deepak Ravichandran Eduard Hovy Information Sciences Institute University of Southern California."

Similar presentations


Ads by Google