Presentation is loading. Please wait.

Presentation is loading. Please wait.

QuALiM – Michael Kaisser The QuALiM Question Answering system Question Answering by Searching Large Corpora with Linguistic Methods.

Similar presentations


Presentation on theme: "QuALiM – Michael Kaisser The QuALiM Question Answering system Question Answering by Searching Large Corpora with Linguistic Methods."— Presentation transcript:

1 QuALiM – Michael Kaisser The QuALiM Question Answering system Question Answering by Searching Large Corpora with Linguistic Methods

2 QuALiM – Michael Kaisser Talk Outline What does a QA system do? QuALiMs two answer strategies: –Fallback mechanism –Rephrasing algorithm TREC evaluation results Post TREC evaluation results

3 QuALiM – Michael Kaisser Question Answering - Definition Definition from Wikipedia: Question Answering (QA) is a type of information retrieval. Given a collection of documents (such as the World Wide Web) the system should be able to retrieve answers to questions posed in natural language. QA is regarded as requiring more complex natural language processing (NLP) techniques than other types of information retrieval such as document retrieval, and it is sometimes regarded as the next step beyond search engines.

4 QuALiM – Michael Kaisser Question Answering - Example Start is MIT’s QA system:

5 QuALiM – Michael Kaisser Question Answering - Example Start is MIT’s QA system:

6 QuALiM – Michael Kaisser Question Answering - Example Start is MIT’s QA system: Better—however— would be: “Albert Einstein was born on March 14th, 1879.” The system should actually return a complete English sentence expressing the desired fact.

7 QuALiM – Michael Kaisser The Fallback Mechanism (exemplary for common answer finding techniques)

8 QuALiM – Michael Kaisser Fallback Mechanism The fallback mechanism creates queries based upon keywords and key phrases from the question. Three queries are send to Google: The first query contains all non-stop words from the question The second contains all NPs from the question (that contain at least one non-stop word) The third query contains all NPs and all non-stop words that do not occur in the NPs.

9 QuALiM – Michael Kaisser Fallback Mechanism So "When was Jim Inhofe first elected to the senate?” becomes Jim Inhofe senate first elected “Jim Inhofe” “the senate” “Jim Inhofe” “the senate” first elected Note: The results from the last query are weighted twice as high as the results form the first two queries.

10 QuALiM – Michael Kaisser Fallback Mechanism 72.0: "senator" 42.0: "senator jim inhofe" "senator jim" 41.25: "r" (abbreviation for republican) 32.25: "oklahoma" 30.0: "r-okla" (abbreviation for republican-oklahoma) 26.25: "1994" 25.0: "the leading conservative voices" "of the leading conservative voices“ "leading conservative voices" 24.0: "us senator" 23.25: "republican" 21.0: "okla" (abbreviation for oklahoma) The result from the queries when placed in a Weighted Sequence Bag:

11 QuALiM – Michael Kaisser Fallback Mechanism 72.0: "senator" 42.0: "senator jim inhofe" "senator jim" 41.25: "r" (abbreviation for republican) 32.25: "oklahoma" 30.0: "r-okla" (abbreviation for republican-oklahoma) 26.25: “1994” 25.0: "the leading conservative voices" "of the leading conservative voices“ "leading conservative voices" 24.0: "us senator" 23.25: "republican" 21.0: "okla" (abbreviation for oklahoma) But we know that we are looking for a date, so the answer is “1994”:

12 QuALiM – Michael Kaisser Definition Questions Query: "Florence Nightingale“ 20.0: "may 12, 1820" 16.0: "may 12" "nursing" 15.0: "august 13, 1910" 14.0: " “ 13.0: "born" 12.0: "august 13" "museum" 11.0: "history" 10.0: "modern nursing" "lady with the lamp" "florence nightingale museum" "the lady with the lamp" 9.0: "italy" 8.0: "of modern nursing" "nurses" "london" 7.5: "on may 12, 1820" 7.0: "2 lambeth palace road london"

13 QuALiM – Michael Kaisser Definition Questions 20.0: "may 12, 1820" 16.0: "may 12" "nursing" 15.0: "august 13, 1910" 14.0: " “ 13.0: "born" 12.0: "august 13" "museum" 11.0: "history" 10.0: "modern nursing" "lady with the lamp" "florence nightingale museum" "the lady with the lamp" 9.0: "italy" 8.0: "of modern nursing" "nurses" "london" 7.5: "on may 12, 1820" 7.0: "2 lambeth palace road london“ Answer sentences in AQUAINT corpus: "on may 12, 1820, the founder of modern nursing, florence nightingale, was born in florence, italy." "on aug. 13, 1910, florence nightingale, the founder of modern nursing, died in london.“

14 QuALiM – Michael Kaisser The Rephrasing Algorithm

15 QuALiM – Michael Kaisser Pattern Layout When did NP V INF NP|PP ? in NP in NP, more targets... dateComplete Date year|in_year Sequences are matched against questions. Targets describe (flat) syntactic structures of potential answer sentences. AnswerTypes place restrictions on the expected answer type.

16 QuALiM – Michael Kaisser Sequences When did NP V INF NP|PP ? This sequence matches all questions beginning with “When” followed by “did” followed by an NP followed by a verb in its infinitive form followed by an NP or a PP followed by a question mark (which has to be the last element in the question) question start word: When word: did phrase: NP POS: V INF phrase: NP or PP punctuation: ? question end

17 QuALiM – Michael Kaisser Sequences When did NP V INF NP|PP ? In the TREC 2005 question set this particular sequence matched 5 questions: “When did Floyd Patterson win the title?” “When did Amtrak begin operations?” “When did Jack Welch become chairman of General Electric?” “When did Jack Welch retire from GE?” “When did the Khmer Rouge come into power?” question start word: When word: did phrase: NP POS: V INF phrase: NP or PP punctuation: ? question end

18 QuALiM – Michael Kaisser Targets in NP in NP, If a question matched a sequence, the targets are used to propose templates for potential answer sentences. For the question “When did Amtrak begin operations”, these would be: ”Amtrak began operations in ANSWER[NP]” ”In ANSWER[NP] (,) Amtrak began operations”

19 QuALiM – Michael Kaisser Targets answer sentence start Amtrak began operations in answer (NP) answer sentence end answer sentence start In answer (NP) (,) Amtrak began operations answer sentence end in NP in NP, 3 4 5

20 QuALiM – Michael Kaisser Targets in NP in NP, The information from the targets can be used to create Google queries: ”Amtrak began operations in” ”In” “Amtrak began operations”

21 QuALiM – Michael Kaisser Snippet Retrieval For the first query ”Amtrak began operations in” the first five sentences Google returns are: “Since Amtrak began operations in 1971, federal outlays for intercity rail passenger service have been about \$18 billion.” “Amtrak began operations in 1971.” “Amtrak of the obligation to operate the basic system of routes that was largely inherited from the private railroads when Amtrak began operations in 1971.” “Amtrak began operations in 1971, as authorized by the Rail Passenger Service Act of 1970.'‘ “A comprehensive history of intercity passenger service in Indiana, from the mid-19th century through May 1, 1971, when Amtrak began operations in the state.”

22 QuALiM – Michael Kaisser Answer Extraction The sentences are parsed and tagged, and by matching then to the targets once more the exact position of the potential answer can be located: “Since Amtrak began operations in 1971, federal outlays for intercity rail passenger service have been about \$18 billion.” “Amtrak began operations in 1971.” “Amtrak of the obligation to operate the basic system of routes that was largely inherited from the private railroads when Amtrak began operations in 1971.” “Amtrak began operations in 1971, as authorized by the Rail Passenger Service Act of 1970.'‘ “A comprehensive history of intercity passenger service in Indiana, from the mid-19th century through May 1, 1971, when Amtrak began operations in the state.”

23 QuALiM – Michael Kaisser QuALiM – Type Checking dateComplete date year|in_year The answerType element in the pattern tells us that we are looking for a date. We’d like to have: a complete date in standard form, e.g. “May 1st, 1971” some form of a date, e.g. “5/1/1971” If we cannot have that, a year specification will also do. (E.g. “1971”)

24 QuALiM – Michael Kaisser QuALiM – Type Checking dateComplete date year|in_year An answerType may contain the following elements: NamedEntity WordNetCategory Built-in (date, year, percentage ect.) Measure (“15 meters”, “100 mph”) List (e.g. a list of movies) WebHypernym other

25 QuALiM – Michael Kaisser Excursus: WordNet

26 QuALiM – Michael Kaisser Excursus: WordNet

27 QuALiM – Michael Kaisser Excursus: WordNet

28 QuALiM – Michael Kaisser Excursus: Named Entity Recognition The task: identify atomic elements of information in text person names company/organization names locations dates× percentages monetary amounts

29 QuALiM – Michael Kaisser Excursus: Named Entity Recognition Task of a NE System: Delimit the named entities in a text and tag them with NE categores: Italy ‘s business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of Arthur Andersen. „Milan“ is part of organization name „Arthur Andersen“ is a company „Italy“ is sentence-initial => capitalization useless

30 QuALiM – Michael Kaisser Excursus: Named Entity Recognition Task of a NE System: Delimit the named entities in a text and tag them with NE categores: „Milan“ is part of organization name „Arthur Andersen“ is a company „Italy“ is sentence-initial => capitalization useless Italy‘s business world was rocked by last Thursday that Mr.Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of Arthur Andersen.

31 QuALiM – Michael Kaisser Excursus: Named Entity Recognition How does it work? Basically quite simple: The system accesses huge lists of: First names Last names Cities Countries... And knows about special words/abbreviations like Mr., Dr., Prof., Inc., Blvd. etc. It knows the names of weekdays or months etc.

32 QuALiM – Michael Kaisser Excursus: Named Entity Recognition Some system use hand-written context-sensitive reduction rules: 1)title capitalized word => title person_name compare „Mr. Jones“ vs. „Mr. Ten-Percent“ => no rule without exceptions 2) person_name, „the“ adj* „CEO of“ organization „Fred Smith, the young dynamic CEO of BlubbCo“ => ability to grasp non-local patterns plus help from databases of known named entities

33 QuALiM – Michael Kaisser QuALiM – Type Checking dateComplete date year|in_year An answerType may contain the following elements: NamedEntity WordNetCategory Built-in (date, year, percentage ect.) Measure (“15 meters”, “100 mph”) List (e.g. a list of movies) WebHypernym other

34 QuALiM – Michael Kaisser QuALiM – Type Checking When the answers are checked on their correct semantic type the first four sentences pass the test, the last one is ruled out: “Since Amtrak began operations in 1971, federal outlays for intercity rail passenger service have been about \$18 billion.” “Amtrak began operations in 1971.” “Amtrak of the obligation to operate the basic system of routes that was largely inherited from the private railroads when Amtrak began operations in 1971.” “Amtrak began operations in 1971, as authorized by the Rail Passenger Service Act of 1970.'‘ “A comprehensive history of intercity passenger service in Indiana, from the mid-19th century through May 1, 1971, when Amtrak began operations in the state.”

35 QuALiM – Michael Kaisser TREC 2004 Results and Post-TREC Evaluation

36 QuALiM – Michael Kaisser TREC Results – factoid questions

37 QuALiM – Michael Kaisser TREC Results – combined score

38 QuALiM – Michael Kaisser Post TREC Evaluation Purpose: What is the performance and behavior of the different algorithms implemented? Performed with resolved questions. (“When was Franz Kafka born?” instead of “When was he born?”) No document localization, thus: –no NIL answers returned –no “unsupported” judgments

39 QuALiM – Michael Kaisser Post TREC Evaluation

40 QuALiM – Michael Kaisser

41


Download ppt "QuALiM – Michael Kaisser The QuALiM Question Answering system Question Answering by Searching Large Corpora with Linguistic Methods."

Similar presentations


Ads by Google