Presentation is loading. Please wait.

Presentation is loading. Please wait.

21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.

Similar presentations


Presentation on theme: "21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing."— Presentation transcript:

1

2 21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua {yangh,chuats}@comp.nus.edu.sg Pris, School of Computing National University of Singapore

3 21/11/2002 Presentation Outline Introduction Pris QA System Design Result and Analysis Conclusion Future Work

4 21/11/2002 Open Domain QA Find answers to open-domain NLP questions by searching a large collection of documents Question Processing May involve question re-formulation To find answer type Query Expansion To overcome concept mis-match between query & info base Search for Candidate Answers Documents, paragraphs, or sentences Disambiguation Ranking (or re-ranking) of answers Location of exact answers

5 21/11/2002 Current Research Trends Web-based QA the Web redundancy Probabilistic algorithm Linguistic-based QA part-of-speech tagging syntactic parsing semantic relations named entity extraction dictionaries WordNet, etc

6 21/11/2002 System Overview Question Classification Question Parsing Query Formulation Document Retrieval Candidate Sentence Retrieval Answer Extraction

7 21/11/2002 Question Classification Question Parsing Question Analysis Web WordNet Query Formulation By External Knowledge Document Retrieval Sentence Ranking Answer Extraction Q A Original Content Words Expanded Content Words Relevant TREC doc Candidate sentences Reduce # of expanded content words

8 21/11/2002 Question Classification Based on question focus and answer type 7 main classes HUM, LOC, TME, NUM, OBJ, DES, UNKNOWN E.g. “Which city is the capital of Canada ? ” (Q-class: LOC) E.g. “Which state is the capital of Canada in? ” (Q-class: LOC) 54 sub-classes E.g. under LOC (location), we have 14 sub-classes: LOC_PLANET: 1 LOC_CITY: 18 LOC_CONTINENT: 3 LOC_COUNTRY: 18 LOC_COUNTY: 3 LOC_STATE: 3 LOC_PROVINCE: 2 LOC_TOWN: 2 LOC_RIVER: 3 LOC_LAKE: 2 LOC_MOUNTAIN: 1 LOC_OCEAN: 2 LOC_ISLAND: 3 LOC_BASIC: 3

9 21/11/2002 Question Parsing Content Words : q (0) Nouns, adjectives, numbers, some verbs E.g. “What mythical Scottish town appears for one day every 100 years ?” Q-class: LOC_TOWN q (0) : (mythical,Scottish,town,appears,one,day,100,years) Base Noun Phrases : n n : (“mythical Scottish town”) Head of the 1 st Noun Phrase: h h : (town) Quotation Words: u E.g. “What was the original name before " The Star Spangled Banner“ ? ” u : (“The Star Spangled Banner”)

10 21/11/2002 Query Formulation I Use original Content Words as query to search the Web (e.g. Google) Find new terms which have high correlation with the original query Use WordNet to find the Synsets and Glosses of original query terms Rank new query terms based on both Web and WordNet Form new boolean query

11 21/11/2002 Query Formulation II Original query q (0) = (q 1 (0), q 2 (0),…, q k (0) ) Use Web as generalized resource From q (0), retrieve top N documents  q i (0)  q (0), extract nearby non-trivial words in one sentence or n words away to get w i Rank w ik  w i by computing its probability of correlation with q i (0) # instances of (w ik /\ q i (0) ) Prob(w ik ) = ---------------------------------- # instances of (w ik \/ q i (0) ) Merge all w i to form C q for q (0)

12 21/11/2002 Query Formulation III Use WordNet as generalized resource  q i (0)  q (0), extract terms that are lexically related to q i (0) by locating them in GlossG i SynsetS i For q (0), we get G q and S q Re-rank w ik  w i by considering lexical relations  w ik  C q, if w ik  G i, w ik increases  if w ik  S i, w ik increases , (0<  <  <1) Get q (1) = q (0) + {top m terms from C q }

13 21/11/2002 Document Retrieval 1,033,461 documents from AP newswire, 1998-2000 New York Times newswire, 1998-2000 Xinhua News Agency, 1996-2000 MG Tool Boolean search to retrieve the top N documents (N = 50)  t k  q (1), Q = (t 1  t 2  …  t n )

14 21/11/2002 Candidate Sentence Retrieval  sent j in the top N documents, match with : quotation words: W uj = % of term overlap between u and Sent j noun phrases: W nj = % of phrase overlap between n and Sent j head of first noun phrase: W hj = 1 if there is a match and 0 otherwise original content words: W cj = % of term overlap between q (0) and Sent j expanded content words: W ej = % of term overlap between q (1-0) and Sent j, where q (1-0) = q (1) - q (0) Final score, where  α i =1, W ij { W uj, W nj, W hj, W cj, W ej }.

15 21/11/2002 Answer Extraction I Fine-grained NE tagging for the top K sentences For each sentence, extract the string which matches the Question Class E.g. “Who is Tom Cruise married to ?” Q-class: HUM_BASIC Top ranked Candidate Sentence: Actor and his wife accepted `` substantial '' libel damages on from a newspaper that reported he was gay and that their marriage was a sham to cover it up. Answer string: Nicole Kidman

16 21/11/2002 Answer Extraction II For some questions, we cannot find any answer reduce the # of expanded query terms and repeat the Document Retrieval, Candidate Sentence Retrieval and Answer Extraction The whole process lasts for N iterations (N=5) If we still cannot find an exact answer, NIL is considered as the answer increase recall step by step while preserving precision

17 21/11/2002 Evaluation in TREC 2002 uninterpolated average precision sum for i=1 to 500 (#-correct-up-to-question-i/i) -------------------------------------------------------------- 500 We answer correctly 290 questions Score 0.61

18 21/11/2002 Result Analysis I 0 5 10 15 20 25 30 35 40 45 13579 11 13 15 17 19 21 23252729 31 33 35 37 39 414345474951 Num of Runs with Correct Answers Num Q with Correct Answer Total Num Q Our Num of Q

19 21/11/2002 Result Analysis II Recognize no answers (NIL) Precision : 41 / 170 = 0.241 Recall : 41 / 46 = 0.891 Non-Nil answers Precision: 249/330 = 0.755 Recall: 249/444 = 0.561 Overall Recall is low compare to precision – because Boolean search is strict.

20 21/11/2002 Result Analysis III

21 21/11/2002 Conclusion Integration of both Lexical Knowledge and External Resources Detailed Question Classification Use of Fine-grained Named Entities for Question Answering Successive Constraint Relaxation

22 21/11/2002 Future Work Refining our terms correlation by considering a combination of local context, global context and lexical correlations Exploring the structured use of external knowledge using the semantic perceptron net Developing template-based answer selection Longer-term research plan : Interactive QA, analysis and opinion questions

23 21/11/2002 Thank You !


Download ppt "21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing."

Similar presentations


Ads by Google